besca¶
helper functions¶
| 
 | Extract the AnnData object saved in adata.raw | 
| 
 | Subset AnnData object into new object | 
| 
 | Convert ENSEMBL gene ids to SYMBOLS Uses the python package mygene to look up the supplied list of ENSEMBLE Ids and return the equivalent list of Symbols. | 
| 
 | Convert SYMBOLS to ENSEMBL gene ids Uses the python package mygene to look up the supplied list of SYMBOLS and return the equivalent list of ENSEMBLE GENEIDs. | 
| 
 | Extract the AnnData object saved in adata.raw | 
| 
 | Calculates average and fraction expression per category in adata.obs | 
| 
 | Calculates average and fraction expression per category in adata.obs Based artihmetic mean expression and fraction cells expressing gene per category (works on linear scale). | 
| 
 | Concatenate two adata objects based on the observations | 
preprocessing¶
| 
 | Filter cell outliers based on counts, numbers of genes expressed, number of cells expressing a gene and mitochondrial gene content. | 
| 
 | Function to remove all genes specified in a gene list read from file. | 
| 
 | Calculate the fraction of cells positive for expression of a gene. | 
| 
 | Cacluate the fraction of reads being attributed to a specific gene. | 
| 
 | Calculate the mean expression of a gene. | 
| 
 | Give out the genes most frequently expressed in cells. | 
| 
 | Function to calculate fraction of counts per cell from a gene list. | 
| 
 | Give out the genes that contribute the largest fraction to the total UMI counts. | 
| 
 | Perform geometric normalization on CITEseq data. | 
| 
 | Estimates and returns the thresholds to use for gene/cell filtering based on outliers calculated from the deviation to the median QCs. | 
| 
 | Function to call scTransform normalization or HVG selection from Python. | 
plotting¶
| 
 | visualize the minimum gene per cell threshold. | 
| 
 | visualize the minimum UMI counts per cell threshold. | 
| 
 | visualize the minimum number of cells expressing a gene threshold. | 
| 
 | visualize maximum UMI counts per cell threshold. | 
| 
 | visualize maximum number of genes per cell threshold. | 
| 
 | visualize maximum mitochondrial gene percentage threshold. | 
| 
 | Plot number of dropouts. | 
| 
 | Plot number of detected genes. | 
| 
 | Plot library size. | 
| 
 | Generates overview figure of libarysize, dropouts and detected genes. | 
| 
 | Plot total gene counts vs detection probability. | 
| 
 | plot top n genes that contribute to fraction of counts per cell | 
| 
 | visualize gene expression of two groups as a split violin plot | 
| 
 | Stacked violin plot for visualization of genes expression. | 
| 
 | plot boxplot with values per individual. | 
| 
 | plot stacked split violin plots. | 
| 
 | generate a box and whisker plot with overlayed swarm plot of celltype abundances | 
| 
 | Generate a stacked bar plot of the percentage of labelcounts within each AnnData subset | 
| 
 | Generate a dot plot, filled with heatmap of individuals cells gene expression. | 
| 
 | Generate a dot plot, filled with heatmap of individuals cells gene expression to compare two conditions. | 
| 
 | Generate a dot plot, filled with heatmap of individuals cells gene expression to compare two conditions (greyscale). | 
| 
 | Update adata object such that the umap will adhere to the palette provided. | 
| 
 | Plot a nomenclature network based on annotation config file. | 
| 
 | Generate a riverplot/sanker diagram between two categories. | 
| 
 | Generate a dot plot showing average expression and fraction positive cells | 
tools¶
| 
 | Generate dataframe containing the label counts/percentages of a specific column in adata.obs | 
| 
 | count occurrence of a label in adata.obs after subseting adata object | 
| 
 | count occurrence of a label for each condition in adata.obs after subseting adata object | 
| 
 | Function to add annotation to adata.obs based on clustering This function replaces the original cluster labels located in the column clustering_label with the new values specified in the list new_cluster_lables. | 
| 
 | reports basic metrics, produces confusion matrices and plots umap of prediction | 
| 
 | plots confusion matrices | 
toolkits¶
batch correction¶
Collection of functions to perform batch correction.
| 
 | function to perform batch correction | 
| 
 | postprocessing to generate a newly functional AnnData object | 
differential gene expression¶
Collection of functions to aid in differential gene expression analysis.
| 
 | Perform differential gene expression between two conditions over many adata subsets. | 
| 
 | plot an interactive volcano plot based on toptable file. | 
| 
 | Get a table of significant DE genes at certain cutoffs Based on an AnnData object and an annotation category (e.g. | 
signature scoring¶
Collection of functions to aid in signature scoring.
| 
 | Super Wrapper function to compute combined signature score for UP and DN scores. | 
| 
 | Compute signed score combining UP and DN for all signatures in signature_dict This function combines genesets (signatures) scores. | 
| 
 | Remove strings from the list that are not in the universe set | 
| 
 | Filter all signatures in signature_dict to remove genes not present in adata | 
| 
 | Convert signature genes with a ortholog conversion Series | 
| 
 | Read gmt file to extract signed genesets. | 
| 
 | Handles missing signatures aux function for make_anno Based on a dataframe of p-values, a signature name and a cutoff check if sign is present :param df: a dataframe of p-values per signature and cell cluster :type df: panda.DataFrame :param signame_complete: signature name :type signame_complete: str :param threshold: cutoff used for cluster attributions :type threshold: numpy.float64 | 
| 
 | Score Clusters based on a set of immune signatures to generate a df of pvals Takes as an input a dataframe with fractions per clusters and a dictionary of signatures Performs a Mann-Whitney test per each column and Signature, returns -10logpValues | 
| 
 | Adds annotation generated with make_anno to a AnnData object Takes as input the AnnData object to which annotation will be appended and the annotation pd Generates a pd.Series that can be added as a adata.obs column with annotation at a level | 
| 
 | Annotate cell types Based on a dataframe of -log10pvals, a cutoff and a signature set generate cell annotation Hierarchical model of Immune cell annotation. | 
| 
 | Reads the configuration file for signature-based hierarhical annotation. | 
| 
 | Matches categories from adata.obs to each other. | 
| 
 | Matches the cnames obtained by the make_annot function or a list of label names to the db label (standardized label from a nomenclature file). | 
| 
 | Matches the cnames obtained by the make_annot function to the db label (standardized label). | 
| 
 | Connect to GEMS, dowload related geneset (specified by setName, can be a prefix/suffix) and return them This function combines genesets (signatures) scores (UP and DN) genes. Non directionaly geneset are by default considered as UP. :param setName: setName to find in GeMs (can be a subset) :type setName: str :param BASE_URL: GeMS url for the api. Should look like: 'http://' + hostname + ':' + localport :type BASE_URL: str :param UP_suffix: str suffix indicating that the suffix indicating the signature is in UP direction. This should be the end of the signatures names ($) :type UP_suffix: str | default = "_UP" :param DN_suffix: str suffix indicating that the suffix indicating the signature is in DN direction. This should be the end of the signatures names ($) :type DN_suffix: str | default = "_DN". | 
| 
 | Insert genesets into the local gems server url_host will depend on GeMs deployement. Could be stored in crendential files. :param BASE_URL: an string 'http://' + hostname + ':' + localport :type BASE_URL: class:str :param genesets: a list of dict; each dict is a signature; key values should mapp the headers :type genesets: list :param params: The command-line arguments for GMTx file upload (see below) based on GeMs structure :type params: list of strings. :param headers: each element is a key of the GEMs setup in place. Minimal requirement for a geneset would be setName, desc and genes (minimal GMT) :type headers: list of string. | 
| 
 | Encapsulating small similary research. Will look for simalirity within GeMs and the mongoDB collections and returns the associated geneseets. :param request: request specificity, if the hosted collection is large, one might need to specify more into details the geneset. :type request: string :param BASE_URL: GeMS url for the api. Should look like: 'http://' + hostname + ':' + localport :type BASE_URL: str :param UP_suffix: str suffix indicating that the suffix indicating the signature is in UP direction. This should be the end of the signatures names ($) :type UP_suffix: str | default = "_UP" :param DN_suffix: str suffix indicating that the suffix indicating the signature is in DN direction. This should be the end of the signatures names ($) :type DN_suffix: str | default = "_DN". | 
| 
 | Export the configuration defined in sigconfig and levsk Order might changed compared to the original sig. | 
| 
 | Convert a simple dictionary into one with direction compatible with combined_signature_score | 
| 
 | Construct a gmtx file according to format conventions for import into Gems. :param setName: informative set name e.g. Pembro_induced_MC38CD8Tcell, Plasma_mdb, TGFB_Stromal_i :type setName: str :param desc: informative and verbose signature description; for cell type signatures use nomenclature, if coef used explain what it represents; link to study if present; e.g. Genes higher expressed in Pembro vs. vehicle in non-naive CD8-positive T cells in MC38 in vivo exp. ID time T2; coefs are log2FC :type desc: str :param User: related to signature origin e.g. Public (for literature-derived sets), own user ID for analysis-derived sets, rtsquad, scsquad, gred, other :type User: str :param Source: source of the signature, one of Literature scseq, Literature, besca, scseqmongodb, internal scseq, pRED, Chugai, gRED, other :type Source: str :param Subtype: specific subtype e.g. onc, all, healthy, disease :type Subtype: str :param domain: one of pathway, biological process, cellular component,molecular function, phenotype, perturbation, disease, misc, microRNA targets, transcription factor targets, cell marker, tissue marker :type domain: str :param genesetname: shared across different signatures of a specific type e.g. besca_marker, dblabel_marker, Pembro_induced_MC38CD8Tcell, FirstAuthorYearPublication :type genesetname: str :param genes: tab-separated list of genes with/without a coefficient e.g. Vim | 2.4 Bin1 | 2.02 or Vim Bin1 :type genes: str :param studyID: study name as in scMongoDB/bescaviz; only when source=internal scseq :type studyID: str | default = None :param analysisID: analysis name as in scMongoDB/bescaviz; only when source=internal scseq :type analysisID: str | default = None :param application: specify which application will read the geneset e.g. rtbeda_CIT, bescaviz, celltypeviz :type application: str | default = None :param celltype: for cell markers, specify celltype according to dblabel_short convention to facilitate reuse :type celltype: str | default = None :param coef_type: specify what the coefficient corresponds too, e.g. logFC, gini, SAM, score, ... :type coef_type: str | default = score. | 
| 
 | Writes a gmtx file that can later be uploaded to GeMS. | 
| 
 | Compute the average and per cell (ie samples) silhouette score for the cluster label (should be present in dataobs) (level 3 annotation), computed level 2 annotation and a random cell assignbation. | 
| 
 | Return a table matching values in vector label. | 
reclustering¶
Collection of functions to perform reclustering on selected subclusters.
| 
 | Perform subclustering on specific celltype to identify subclusters. | 
| 
 | annotate new cellnames to each of the subclusters identified by running recluster. | 
auto-annot¶
Collection of functions to perform auto-annot : annotating a sc datasets based on a reference one.
| 
 | Function to read in training and testing datasets | 
| 
 | read from adata.raw and revert log1p normalization | 
| 
 | read adata files of training and testing datasets | 
| 
 | read adata files of training and testing datasets | 
| 
 | concatenates training anndata objects | 
| 
 | corrects datasets using scanorama and merge training datasets subsequently | 
| 
 | removes all genes not in gene set | 
| 
 | removes all genes not in all datasets | 
| 
 | removes all celltypes not in all datasets | 
| 
 | fits classifier on training dataset | 
| 
 | fits linear svm on training dataset | 
| 
 | fits radial basis function kernel svm on training dataset | 
| 
 | fits linear svm on training dataset using stochastic gradient descent | 
| 
 | fits a random forest of a thousand esitamtors with balance class weight on training dataset. | 
| 
 | multiclass crossvalidated logistic regression with balanced class weight. | 
| 
 | multiclass crossvalidated logistic regression with balanced class weight. | 
| 
 | multiclass crossvalidated logistic regression with balanced class weight. | 
| 
 | predicts on testing set using trained classifier | 
| 
 | predicts on testing set using trained classifier | 
| 
 | predicts on testing set using trained classifier and returns class probability for every cell and every class | 
| 
 | predicts on testing set using trained classifier and returns probabilities | 
| 
 | reports basic metrics, produces confusion matrices and plots umap of prediction Writes out a csv file containing all accuracy and f1 scores. | 
| 
 | merges all datasets and predicts on testing set with scANVI. | 
| 
 | merges all datasets and stores learnt representation in obsm | 
| 
 | plots a umap of all merged datasets coloured by dataset of origin. | 
Import¶
| 
 | Read matrix.mtx, genes.tsv, barcodes.tsv to AnnData object. By specifiying an input folder this function reads the contained matrix.mtx, genes.tsv and barcodes.tsv files to an AnnData object. In case annotation = True it also adds the annotation contained in metadata.tsv to the object. :param filepath: filepath as string to the directory containg the matrix.mtx, genes.tsv, barcodes.tsv and if applicable metadata.tsv :type filepath: str :param annotation: boolian identifier if an annotation file is also located in the folder and should be added to the AnnData object :type annotation: bool (default = True) :param use_genes: either SYMBOL or ENSEMBL. Other genenames are not yet supported. :type use_genes: str :param species: string specifying the species, only needs to be used when no Gene Symbols are supplied and you only have the ENSEMBLE gene ids to perform a lookup. :type species: str | default = 'human' :param citeseq: string indicating if only gene expression values (gex_only) or only protein expression values ('citeseq_only') or everything is read if None is specified :type citeseq: 'gex_only' or 'citeseq_only' or False or None | default = None. | 
| 
 | add a labeling written out in the FAIR formating to adata.obs | 
| 
 | Asserts that an adata object is containing information needed for the besca pipeline to run and export information. | 
export¶
| 
 | export adata object to mtx format (matrix.mtx, genes.tsv, barcodes.tsv) | 
| 
 | export adata.raw to .mtx (matrix.mtx, genes.tsv, barcodes, tsv) | 
| 
 | export mapping of cells to clusters to .tsv file | 
| 
 | export mapping of cells to specified label to .tsv file | 
| 
 | write out labeling info for uploading to database | 
| 
 | export plotting coordinates to analysis_metadata.tsv | 
| 
 | Generate Gene Expression Profile (GEP) from scRNA-seq annotations | 
| 
 | export marker genes for each cluster to .gct file | 
| 
 | export pseudobulk profiles of cells to .gct files | 
standardworkflow¶
| 
 | Read matrix file as expected for the standard workflow. | 
| 
 | |
| 
 | |
| 
 | Export raw cp10k to FAIR format for loading into database | 
| 
 | Export regressedOut to FAIR format for loading into database | 
| 
 | Export cluster to cell mapping to FAIR format for loading into database | 
| 
 | Export metadata in FAIR format for loading into database | 
| 
 | Export ranked genes to FAIR format for loading into database | 
| 
 | Export celltype annotation to cell mapping in FAIR format for loading into database | 
| 
 | Standard Workflow function to export an additional labeling besides louvain to FAIR format. | 
| 
 | Standard Workflow function to export an additional labeling besides louvain to FAIR format. |