generate_gep

besca.export.generate_gep(adata: AnnData, filename: str = 'gep_basis_vector.csv', column: str = '<last_column>', annot: str = 'ENSEMBL', outpath: Optional[str] = None)[source]

Generate Gene Expression Profile (GEP) from scRNA-seq annotations

Reads in the AnnData object, taking only the pre-filtered highly variable genes (determined in the BESCA annotation workflow), to index the adata.raw expression matrix. The adata.raw matrix is log1p from the workflow, thus we linearize it before summing the values across each cell type.

We generate the GEP from adata.raw and not from adata because we need CP10k normalised values, which adata doesnt contain as it will have gone through several normalisation steps further downstream. At the same time we are subsetting adata.raw by the highly variable genes present only present in adata.

For each cell type, its gene expression is calculted by summing up all values for given gene and given cell type. A mean value is not taken as there are many cell with zero expresion for given gene.

Parameters:
  • adata (AnnData) – the AnnData object that should be exported

  • 'gep_basis_vector.csv' (filename 'str' | default =) – name of output file

  • column (str | default = ‘<last_column>’) – Name of the column in adata.obs that contains cell-type annotations based on which the GEP is supposed to be generated. The default value chooses the last column in the adata.obs

  • annot ('str' | default = 'ENSEMBL') – Choose which gene annotation to use [‘ENSEMBL’, ‘SYMBOL’] for the exported GEP

  • directory (outpath str | default = current working) – filepath to the directory in which the results should be outputed, if no directory is specified it outputs the results to the current working directory.

Returns:

files are written out

Return type:

None