filter_siggenes¶
- besca.tl.sig.filter_siggenes(adata, signature_dict)[source]¶
Filter all signatures in signature_dict to remove genes not present in adata
- Parameters:
adata (class:~anndata.AnnData) – An AnnData object (from scanpy). Following besca convention, var names (genes) are HGNC symbol and should match the signatures values.
signature_dict (dict) – a dictionary of signatures. Keys signature names, and values can come in two variants: * Variant 1: values are a list of gene names as strings. An example: {‘gs1’: [‘A’, ‘B’], ‘gs2’:[‘B’, ‘C’]}; * Variant 2: Values are a dict with keys, for instance directions (UP/DN), and genes names in values. An example: {‘gs1’: {‘UP’: ‘A’, ‘DN’: ‘B’}, ‘gs2’: {‘UP’: [‘C’, ‘D’], ‘DN’: [‘E’, ‘A’]}}.
- Returns:
signature_dict_filtered – a dictionary of signatures in the same format as the input.
- Return type:
dict
Example
>>> import besca as bc >>> adata = bc.datasets.simulated_pbmc3k_processed() >>> sigs = {'GeneSet1': ['Gene_0', 'Gene_2', 'Gene_3', 'NoSuchAGene'], ... 'GeneSet2': ['Gene_5', 'Gene_6', 'Gene_8', 'UnknownGene']} >>> filtered_sigs = bc.tl.sig.filter_siggenes(adata, sigs) >>> filtered_sigs {'GeneSet1': ['Gene_0', 'Gene_2', 'Gene_3'], 'GeneSet2': ['Gene_5', 'Gene_6', 'Gene_8']} >>> signed_sigs = {'GeneSet1': {'UP': ['Gene_0', 'Gene_2', 'Gene_3', ... 'NoSuchAGene'], ... 'DN': ['Gene_5', 'Gene_6', 'Gene_8', ... 'UnknownGene']}, ... 'GeneSet2': {'UP': ['Gene_10', 'Gene_11', 'Gene_13', ... 'NoSuchAGene2'], ... 'DN': ['Gene_14', 'Gene_16', 'Gene_18', ... 'UnknownGene2']}} >>> filtered_signed_sigs = bc.tl.sig.filter_siggenes(adata, signed_sigs) >>> filtered_signed_sigs {'GeneSet1': {'UP': ['Gene_0', 'Gene_2', 'Gene_3'], 'DN': ['Gene_5', 'Gene_6', 'Gene_8']}, 'GeneSet2': {'UP': ['Gene_10', 'Gene_11', 'Gene_13'], 'DN': ['Gene_14', 'Gene_16', 'Gene_18']}}