perform_dge¶

besca.tl.dge.perform_dge(adata, design_matrix, differentiating_criteria, constant_criteria, basepath, min_cells_per_group=30, method='wilcoxon')[source]¶

Perform differential gene expression between two conditions over many adata subsets.

This function automatically generates top_tables and rank_files for a list of comparisons in a dataset. The comparison you wish to perform need to be identified in a so called design matrix (see below).

This function is capable of handling comparisons where you wish to compare two conditions in a subset of the dataset, e.g. treatment vs control in the celltype CD4 T-cell. The conditions must be annotated in a column adata.obs, this represents the differentiating_criteria. This column may only have two different labels! The subsets in which this comparison should be made must be annotated in another column represented by ‘constant_criteria’. This column may have as many labels as you wish.

Design Matrix: the design matrix consists of a pandas.DataFrame with two columns. Each row represents one comparison that is to be made. The first column labeled ‘Group1’, contains a tuple identifying the first group for that comparison and the second column labeled ‘Group2’ contains a tuple identifying the second group for the comparison. The tuple has the form (differentiating_criteria, constant_criteria).

>>> import pandas as pd
>>> celltypes = ['CD4 T-cell', 'CD8 T-cell', 'B-cell', 'myeloid cell']
>>> design_matrix = pd.DataFrame({'Group1':[('PBMC', celltype) for celltype in celltypes], 'Group2':[('Skin', celltype) for celltype in celltypes]})

Parameters:

adata (AnnData) – AnnData object containing
design_matrix (pandas.DataFrame) – pandas.DataFrame containing all the comparisons that are to be made.
method (str) – one of ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’, ‘logreg’