filter_siggenes

besca.tl.sig.filter_siggenes(adata, signature_dict)[source]

Filter all signatures in signature_dict to remove genes not present in adata

Parameters:
  • adata (class:~anndata.AnnData) – An AnnData object (from scanpy). Following besca convention, var names (genes) are HGNC symbol and should match the signatures values.

  • signature_dict (dict) – a dictionary of signatures. Keys signature names, and values can come in two variants: * Variant 1: values are a list of gene names as strings. An example: {‘gs1’: [‘A’, ‘B’], ‘gs2’:[‘B’, ‘C’]}; * Variant 2: Values are a dict with keys, for instance directions (UP/DN), and genes names in values. An example: {‘gs1’: {‘UP’: ‘A’, ‘DN’: ‘B’}, ‘gs2’: {‘UP’: [‘C’, ‘D’], ‘DN’: [‘E’, ‘A’]}}.

Returns:

signature_dict_filtered – a dictionary of signatures in the same format as the input.

Return type:

dict

Example

>>> import besca as bc
>>> adata = bc.datasets.simulated_pbmc3k_processed()
>>> sigs = {'GeneSet1': ['Gene_0', 'Gene_2', 'Gene_3', 'NoSuchAGene'],
...         'GeneSet2': ['Gene_5', 'Gene_6', 'Gene_8', 'UnknownGene']}
>>> filtered_sigs = bc.tl.sig.filter_siggenes(adata, sigs)
>>> filtered_sigs
{'GeneSet1': ['Gene_0', 'Gene_2', 'Gene_3'], 'GeneSet2': ['Gene_5', 'Gene_6', 'Gene_8']}
>>> signed_sigs = {'GeneSet1': {'UP': ['Gene_0', 'Gene_2', 'Gene_3',
...                                              'NoSuchAGene'],
...                               'DN': ['Gene_5', 'Gene_6', 'Gene_8',
...                                              'UnknownGene']},
...                  'GeneSet2': {'UP': ['Gene_10', 'Gene_11', 'Gene_13',
...                                               'NoSuchAGene2'],
...                               'DN': ['Gene_14', 'Gene_16', 'Gene_18',
...                                               'UnknownGene2']}}
>>> filtered_signed_sigs = bc.tl.sig.filter_siggenes(adata, signed_sigs)
>>> filtered_signed_sigs
{'GeneSet1': {'UP': ['Gene_0', 'Gene_2', 'Gene_3'], 'DN': ['Gene_5', 'Gene_6', 'Gene_8']}, 'GeneSet2': {'UP': ['Gene_10', 'Gene_11', 'Gene_13'], 'DN': ['Gene_14', 'Gene_16', 'Gene_18']}}