read_GMT_sign¶

besca.tl.sig.read_GMT_sign(GMT_file, UP_suffix='_UP', DN_suffix='_DN', directed=True, verbose=False)[source]¶

Read gmt file to extract signed genesets. This function combines genesets scores composed of UP and DN regulated genes. Non directional geneset are by default considered as UP.

Parameters:

GMT_file (str | default = None) – gmt file location containing the geneset
UP_suffix (str | default = “_UP”) – str suffix indicating that the suffix indicating the signature is UP. This should be the end of the signatures names ($). Indicate a dummy string to avoid combination.
DN_suffix (str | default = “_DN”) – str suffix indicating that the suffix indicating the signature is DN. This should be the end of the signatures names ($). Indicate a dummy string to avoid combination.

Return type:

a dictionnary containing the signature names as key. Value are a subdictionnary where key are direction(UP or DN). Values are then the gene names.

Example

>>> import besca as bc
>>> import pkg_resources
>>> gmt_file='datasets/genesets/Immune.gmt' # provided in besca
>>> gmt_file_abs_path=pkg_resources.resource_filename('besca', gmt_file)
>>> bc.tl.sig.read_GMT_sign(gmt_file_abs_path)
{'lymphocyte': {'UP': ['PTPRC']}, 'myeloid': {'UP': ['S100A8', 'S100A9', 'CST3']}, 'Bcell': {'UP': ['CD19', 'CD79A', 'MS4A1']}, 'Tcells': {'UP': ['CD3E', 'CD3G', 'CD3D']}, 'CD4': {'UP': ['CD4']}, 'CD8': {'UP': ['CD8A', 'CD8B']}, 'NKcell': {'UP': ['NKG7', 'GNLY', 'NCAM1']}, 'monocyte': {'UP': ['CST3', 'CSF1R', 'ITGAM', 'CD14', 'FCGR3A', 'FCGR3B']}, 'macrophage': {'UP': ['CD14', 'IL1B', 'LYZ', 'CD163', 'ITGAX', 'CD68', 'CSF1R', 'FCGR3A']}}