filter¶

besca.pp.filter(adata, max_genes=None, min_genes=None, max_counts=None, min_counts=None, min_cells=None, max_mito=None, annotation_type=None, species='human')[source]¶

Filter cell outliers based on counts, numbers of genes expressed, number of cells expressing a gene and mitochondrial gene content.

Filtering is performed iteratively in the order: max_genes, min_genes, max_counts, min_counts, min_cells, max_mito.

The Thresholds are defined as follows: max_genes >= n_genes min_genes <= n_genes max_counts >= n_counts min_counts <= n_counts min_cells <= n_cells max_mito > percent mito

Parameters:

adata (AnnData) – The annotated data matrix.
max_genes (int | default = None) – integer value specifying the threshold for the maximum number of genes that a cell needs to express
min_genes (int | default = None) – integer value specifying the threshold for the minimum number of genes that a cell needs to express
max_counts (int | default = None) – integer value specifying the maximum number of counts that a cell needs to contain
min_counts (int | default = None) – integer value specifying the minimum number of counts that a cell needs to contain
min_cells (int | default = None) – integer value specifying the minimum number of cells that need to express a gene for it to be included
max_mito (float | default = None) – decimalvalue specifying the threshold for the maximum percentage of mitochondrial genes in a cell
annotation_type (SYMBOL or ENSEMBLE or None | default = None) – string identifying the type of gene ids contained in adata.var_names, necessary for identifying mitochondrial genes in case percent_mito is not already included in adata.obs

Returns:

filters the AnnData object and potentially adds either n_genes or n_counts to adata.obs.

Return type:

AnnData

Example

>>> import besca as bc
>>> adata = bc.datasets.simulated_pbmc3k_raw()
>>> adata = bc.pp.filter(adata, max_counts=6500, max_genes=1900, max_mito=0.05, min_genes=600, min_counts=600, min_cells=2, annotation_type='SYMBOL')