besca¶

helper functions¶

`besca.get_raw`(adata)	Extract the AnnData object saved in adata.raw
`besca.subset_adata`(adata, filter_criteria[, ...])	Subset AnnData object into new object
`besca.convert_ensembl_to_symbol`(gene_list[, ...])	Convert ENSEMBL gene ids to SYMBOLS Uses the python package mygene to look up the supplied list of ENSEMBLE Ids and return the equivalent list of Symbols.
`besca.convert_symbol_to_ensembl`(gene_list[, ...])	Convert SYMBOLS to ENSEMBL gene ids Uses the python package mygene to look up the supplied list of SYMBOLS and return the equivalent list of ENSEMBLE GENEIDs.
`besca.get_raw`(adata)	Extract the AnnData object saved in adata.raw
`besca.get_means`(adata, mycat[, condition])	Calculates average and fraction expression per category in adata.obs
`besca.get_ameans`(adata, mycat[, condition])	Calculates average and fraction expression per category in adata.obs Based artihmetic mean expression and fraction cells expressing gene per category (works on linear scale).
`besca.concate_adata`(adata1, adata2)	Concatenate two adata objects based on the observations

preprocessing¶

`filter`(adata[, max_genes, min_genes, ...])	Filter cell outliers based on counts, numbers of genes expressed, number of cells expressing a gene and mitochondrial gene content.
`filter_gene_list`(adata, filepath[, use_raw, ...])	Function to remove all genes specified in a gene list read from file.
`frac_pos`(adata[, threshold])	Calculate the fraction of cells positive for expression of a gene.
`frac_reads`(adata)	Cacluate the fraction of reads being attributed to a specific gene.
`mean_expr`(adata)	Calculate the mean expression of a gene.
`top_expressed_genes`(adata[, top_n])	Give out the genes most frequently expressed in cells.
`fraction_counts`(adata[, species, name, ...])	Function to calculate fraction of counts per cell from a gene list.
`top_counts_genes`(adata[, top_n])	Give out the genes that contribute the largest fraction to the total UMI counts.
`normalize_geometric`(adata)	Perform geometric normalization on CITEseq data.
`valOutlier`(adata[, nmads, rlib_loc])	Estimates and returns the thresholds to use for gene/cell filtering based on outliers calculated from the deviation to the median QCs.
`scTransform`(adata[, hvg, n_genes, rlib_loc])	Function to call scTransform normalization or HVG selection from Python.

plotting¶

`kp_genes`(adata[, threshold, min_genes, ax, ...])	visualize the minimum gene per cell threshold.
`kp_counts`(adata[, min_counts, ax, figsize])	visualize the minimum UMI counts per cell threshold.
`kp_cells`(adata[, threshold, min_cells, ax, ...])	visualize the minimum number of cells expressing a gene threshold.
`max_counts`(adata[, max_counts, ax, figsize])	visualize maximum UMI counts per cell threshold.
`max_genes`(adata[, max_genes, ax, figsize])	visualize maximum number of genes per cell threshold.
`max_mito`(adata[, max_mito, annotation_type, ...])	visualize maximum mitochondrial gene percentage threshold.
`dropouts`(adata[, ax, bins, figsize])	Plot number of dropouts.
`detected_genes`(adata[, ax, bins, figsize])	Plot number of detected genes.
`library_size`(adata[, ax, bins, figsize])	Plot library size.
`librarysize_overview`(adata[, bins, figsize])	Generates overview figure of libarysize, dropouts and detected genes.
`transcript_capture_efficiency`(adata[, ax, ...])	Plot total gene counts vs detection probability.
`top_genes_counts`(adata[, top_n, ax, figsize])	plot top n genes that contribute to fraction of counts per cell
`gene_expr_split`(adata, genes[, ...])	visualize gene expression of two groups as a split violin plot
`gene_expr_split_stacked`(adata, genes, ...[, ...])	Stacked violin plot for visualization of genes expression.
`box_per_ind`(plotdata, y_axis, x_axis[, ...])	plot boxplot with values per individual.
`stacked_split_violin`(tidy_data, x_axis, ...)	plot stacked split violin plots.
`celllabel_quant_boxplot`(adata, ...[, ...])	generate a box and whisker plot with overlayed swarm plot of celltype abundances
`celllabel_quant_stackedbar`(adata, ...[, ...])	Generate a stacked bar plot of the percentage of labelcounts within each AnnData subset
`dot_heatmap`(adata, genes[, group_by, ...])	Generate a dot plot, filled with heatmap of individuals cells gene expression.
`dot_heatmap_split`(adata, genes, split_by[, ...])	Generate a dot plot, filled with heatmap of individuals cells gene expression to compare two conditions.
`dot_heatmap_split_greyscale`(adata, genes, ...)	Generate a dot plot, filled with heatmap of individuals cells gene expression to compare two conditions (greyscale).
`update_qualitative_palette`(adata, palette[, ...])	Update adata object such that the umap will adhere to the palette provided.
`nomenclature_network`(config_file[, ...])	Plot a nomenclature network based on annotation config file.
`riverplot_2categories`(adata, categories[, ...])	Generate a riverplot/sanker diagram between two categories.
`flex_dotplot`(df, X, Y, HUE, SIZE, title[, ...])	Generate a dot plot showing average expression and fraction positive cells

tools¶

`count_occurrence`(adata[, count_variable, ...])	Generate dataframe containing the label counts/percentages of a specific column in adata.obs
`count_occurrence_subset`(adata, subset_variable)	count occurrence of a label in adata.obs after subseting adata object
`count_occurrence_subset_conditions`(adata, ...)	count occurrence of a label for each condition in adata.obs after subseting adata object
`annotate_cells_clustering`(adata, ...[, ...])	Function to add annotation to adata.obs based on clustering This function replaces the original cluster labels located in the column clustering_label with the new values specified in the list new_cluster_lables.
`report`(adata_pred, celltype, method, ...[, ...])	reports basic metrics, produces confusion matrices and plots umap of prediction
`plot_confusion_matrix`(y_true, y_pred, ...[, ...])	plots confusion matrices

toolkits¶

batch correction¶

Collection of functions to perform batch correction.

`batch_correct`(adata, batch_to_correct)	function to perform batch correction
`postprocess_mnnpy`(adata, bdata)	postprocessing to generate a newly functional AnnData object

differential gene expression¶

Collection of functions to aid in differential gene expression analysis.

`perform_dge`(adata, design_matrix, ...[, ...])	Perform differential gene expression between two conditions over many adata subsets.
`plot_interactive_volcano`(top_table_path, outdir)	plot an interactive volcano plot based on toptable file.
`get_de`(adata, mygroup[, demethod, topnr, ...])	Get a table of significant DE genes at certain cutoffs Based on an AnnData object and an annotation category (e.g.

signature scoring¶

Collection of functions to aid in signature scoring.

`combined_signature_score`(adata[, GMT_file, ...])	Super Wrapper function to compute combined signature score for UP and DN scores.
`compute_signed_score`(adata, signature_dict)	Compute signed score combining UP and DN for all signatures in signature_dict This function combines genesets (signatures) scores.
`filter_by_set`(strs, universe_set)	Remove strings from the list that are not in the universe set
`filter_siggenes`(adata, signature_dict)	Filter all signatures in signature_dict to remove genes not present in adata
`convert_siggenes`(signature_dict, conversion)	Convert signature genes with a ortholog conversion Series
`read_GMT_sign`(GMT_file[, UP_suffix, ...])	Read gmt file to extract signed genesets.
`getset`(df, signame_complete, threshold)	Handles missing signatures aux function for make_anno Based on a dataframe of p-values, a signature name and a cutoff check if sign is present :param df: a dataframe of p-values per signature and cell cluster :type df: panda.DataFrame :param signame_complete: signature name :type signame_complete: str :param threshold: cutoff used for cluster attributions :type threshold: numpy.float64
`score_mw`(f, mymarkers)	Score Clusters based on a set of immune signatures to generate a df of pvals Takes as an input a dataframe with fractions per clusters and a dictionary of signatures Performs a Mann-Whitney test per each column and Signature, returns -10logpValues
`add_anno`(adata, cnames, mycol[, clusters])	Adds annotation generated with make_anno to a AnnData object Takes as input the AnnData object to which annotation will be appended and the annotation pd Generates a pd.Series that can be added as a adata.obs column with annotation at a level
`make_anno`(df, sigscores, sigconfig, levsk[, ...])	Annotate cell types Based on a dataframe of -log10pvals, a cutoff and a signature set generate cell annotation Hierarchical model of Immune cell annotation.
`read_annotconfig`(configfile)	Reads the configuration file for signature-based hierarhical annotation.
`match_cluster`(adata, obsquery, obsqueryval)	Matches categories from adata.obs to each other.
`obtain_new_label`(nomenclature_file, cnames)	Matches the cnames obtained by the make_annot function or a list of label names to the db label (standardized label from a nomenclature file).
`obtain_dblabel`(nomenclature_file, cnames[, ...])	Matches the cnames obtained by the make_annot function to the db label (standardized label).
`get_gems`(setName, BASE_URL[, UP_suffix, ...])	Connect to GEMS, dowload related geneset (specified by setName, can be a prefix/suffix) and return them This function combines genesets (signatures) scores (UP and DN) genes. Non directionaly geneset are by default considered as UP. :param setName: setName to find in GeMs (can be a subset) :type setName: str :param BASE_URL: GeMS url for the api. Should look like: 'http://' + hostname + ':' + localport :type BASE_URL: str :param UP_suffix: str suffix indicating that the suffix indicating the signature is in UP direction. This should be the end of the signatures names ($) :type UP_suffix: str \| default = "_UP" :param DN_suffix: str suffix indicating that the suffix indicating the signature is in DN direction. This should be the end of the signatures names ($) :type DN_suffix: str \| default = "_DN".
`insert_gems`(BASE_URL, genesets, params[, ...])	Insert genesets into the local gems server url_host will depend on GeMs deployement. Could be stored in crendential files. :param BASE_URL: an string 'http://' + hostname + ':' + localport :type BASE_URL: class:str :param genesets: a list of dict; each dict is a signature; key values should mapp the headers :type genesets: list :param params: The command-line arguments for GMTx file upload (see below) based on GeMs structure :type params: list of strings. :param headers: each element is a key of the GEMs setup in place. Minimal requirement for a geneset would be setName, desc and genes (minimal GMT) :type headers: list of string.
`get_similar_geneset`(request, BASE_URL[, ...])	Encapsulating small similary research. Will look for simalirity within GeMs and the mongoDB collections and returns the associated geneseets. :param request: request specificity, if the hosted collection is large, one might need to specify more into details the geneset. :type request: string :param BASE_URL: GeMS url for the api. Should look like: 'http://' + hostname + ':' + localport :type BASE_URL: str :param UP_suffix: str suffix indicating that the suffix indicating the signature is in UP direction. This should be the end of the signatures names ($) :type UP_suffix: str \| default = "_UP" :param DN_suffix: str suffix indicating that the suffix indicating the signature is in DN direction. This should be the end of the signatures names ($) :type DN_suffix: str \| default = "_DN".
`export_annotconfig`(sigconfig, levsk, ...[, ...])	Export the configuration defined in sigconfig and levsk Order might changed compared to the original sig.
`convert_to_directed`(signature_dict[, direction])	Convert a simple dictionary into one with direction compatible with combined_signature_score
`make_gmtx`(setName, desc, User, Source, ...)	Construct a gmtx file according to format conventions for import into Gems. :param setName: informative set name e.g. Pembro_induced_MC38CD8Tcell, Plasma_mdb, TGFB_Stromal_i :type setName: str :param desc: informative and verbose signature description; for cell type signatures use nomenclature, if coef used explain what it represents; link to study if present; e.g. Genes higher expressed in Pembro vs. vehicle in non-naive CD8-positive T cells in MC38 in vivo exp. ID time T2; coefs are log2FC :type desc: str :param User: related to signature origin e.g. Public (for literature-derived sets), own user ID for analysis-derived sets, rtsquad, scsquad, gred, other :type User: str :param Source: source of the signature, one of Literature scseq, Literature, besca, scseqmongodb, internal scseq, pRED, Chugai, gRED, other :type Source: str :param Subtype: specific subtype e.g. onc, all, healthy, disease :type Subtype: str :param domain: one of pathway, biological process, cellular component,molecular function, phenotype, perturbation, disease, misc, microRNA targets, transcription factor targets, cell marker, tissue marker :type domain: str :param genesetname: shared across different signatures of a specific type e.g. besca_marker, dblabel_marker, Pembro_induced_MC38CD8Tcell, FirstAuthorYearPublication :type genesetname: str :param genes: tab-separated list of genes with/without a coefficient e.g. Vim \| 2.4 Bin1 \| 2.02 or Vim Bin1 :type genes: str :param studyID: study name as in scMongoDB/bescaviz; only when source=internal scseq :type studyID: str \| default = None :param analysisID: analysis name as in scMongoDB/bescaviz; only when source=internal scseq :type analysisID: str \| default = None :param application: specify which application will read the geneset e.g. rtbeda_CIT, bescaviz, celltypeviz :type application: str \| default = None :param celltype: for cell markers, specify celltype according to dblabel_short convention to facilitate reuse :type celltype: str \| default = None :param coef_type: specify what the coefficient corresponds too, e.g. logFC, gini, SAM, score, ... :type coef_type: str \| default = score.
`write_gmtx_forgems`(signature_dict, GMT_file)	Writes a gmtx file that can later be uploaded to GeMS.
`silhouette_computation`(adata[, cluster, ...])	Compute the average and per cell (ie samples) silhouette score for the cluster label (should be present in dataobs) (level 3 annotation), computed level 2 annotation and a random cell assignbation.
`match_label`(vector_label, nomenclature_file)	Return a table matching values in vector label.

reclustering¶

Collection of functions to perform reclustering on selected subclusters.

`recluster`(adata, celltype[, celltype_label, ...])	Perform subclustering on specific celltype to identify subclusters.
`annotate_new_cellnames`(adata, ...[, ...])	annotate new cellnames to each of the subclusters identified by running recluster.

auto-annot¶

Collection of functions to perform auto-annot : annotating a sc datasets based on a reference one.

`read_data`(train_paths, train_datasets, ...)	Function to read in training and testing datasets
`read_raw`(train_paths, train_datasets, ...)	read from adata.raw and revert log1p normalization
`read_adata`(train_paths, train_datasets, ...)	read adata files of training and testing datasets
`merge_data`(adata_trains, adata_pred[, ...])	read adata files of training and testing datasets
`naive_merge`(adata_trains)	concatenates training anndata objects
`scanorama_merge`(adata_trains, adata_pred, ...)	corrects datasets using scanorama and merge training datasets subsequently
`remove_genes`(adata_trains, adata_pred, ...)	removes all genes not in gene set
`intersect_genes`(adata_train, adata_pred)	removes all genes not in all datasets
`remove_nonshared`(adata_train, adata_pred[, ...])	removes all celltypes not in all datasets
`fit`(adata_train, method, celltype[, njobs, ...])	fits classifier on training dataset
`linear_svm`(train, y_train)	fits linear svm on training dataset
`rbf_svm`(train, y_train)	fits radial basis function kernel svm on training dataset
`sgd_svm`(train, y_train)	fits linear svm on training dataset using stochastic gradient descent
`random_forest`(train, y_train, njobs)	fits a random forest of a thousand esitamtors with balance class weight on training dataset.
`logistic_regression`(train, y_train, njobs)	multiclass crossvalidated logistic regression with balanced class weight.
`logistic_regression_ovr`(train, y_train, njobs)	multiclass crossvalidated logistic regression with balanced class weight.
`logistic_regression_elastic`(train, y_train, ...)	multiclass crossvalidated logistic regression with balanced class weight.
`adata_predict`(classifier, scaler, ...[, ...])	predicts on testing set using trained classifier
`predict`(classifier, scaler, adata_pred[, ...])	predicts on testing set using trained classifier
`adata_pred_prob`(classifier, scaler, ...[, ...])	predicts on testing set using trained classifier and returns class probability for every cell and every class
`predict_proba`(classifier, scaler, adata_pred)	predicts on testing set using trained classifier and returns probabilities
`report`(adata_pred, celltype, method, ...[, ...])	reports basic metrics, produces confusion matrices and plots umap of prediction Writes out a csv file containing all accuracy and f1 scores.
`scanvi_predict`(adata_trains, adata_pred, ...)	merges all datasets and predicts on testing set with scANVI.
`scvi_merge`(adata_trains, adata_pred)	merges all datasets and stores learnt representation in obsm
`visualise_scvi_merge`(adata_concat)	plots a umap of all merged datasets coloured by dataset of origin.

Import¶

`read_mtx`(filepath[, annotation, use_genes, ...])	Read matrix.mtx, genes.tsv, barcodes.tsv to AnnData object. By specifiying an input folder this function reads the contained matrix.mtx, genes.tsv and barcodes.tsv files to an AnnData object. In case annotation = True it also adds the annotation contained in metadata.tsv to the object. :param filepath: filepath as string to the directory containg the matrix.mtx, genes.tsv, barcodes.tsv and if applicable metadata.tsv :type filepath: str :param annotation: boolian identifier if an annotation file is also located in the folder and should be added to the AnnData object :type annotation: bool (default = True) :param use_genes: either SYMBOL or ENSEMBL. Other genenames are not yet supported. :type use_genes: str :param species: string specifying the species, only needs to be used when no Gene Symbols are supplied and you only have the ENSEMBLE gene ids to perform a lookup. :type species: str \| default = 'human' :param citeseq: string indicating if only gene expression values (gex_only) or only protein expression values ('citeseq_only') or everything is read if None is specified :type citeseq: 'gex_only' or 'citeseq_only' or False or None \| default = None.
`add_cell_labeling`(adata, filepath[, label])	add a labeling written out in the FAIR formating to adata.obs
`assert_adata`(adata[, attempFix])	Asserts that an adata object is containing information needed for the besca pipeline to run and export information.

export¶

`X_to_mtx`(adata[, outpath, write_metadata, ...])	export adata object to mtx format (matrix.mtx, genes.tsv, barcodes.tsv)
`raw_to_mtx`(adata[, outpath, write_metadata, ...])	export adata.raw to .mtx (matrix.mtx, genes.tsv, barcodes, tsv)
`clustering`(adata[, outpath, export_average, ...])	export mapping of cells to clusters to .tsv file
`write_labeling_to_files`(adata[, outpath, ...])	export mapping of cells to specified label to .tsv file
`labeling_info`([outpath, description, ...])	write out labeling info for uploading to database
`analysis_metadata`(adata[, outpath, ...])	export plotting coordinates to analysis_metadata.tsv
`generate_gep`(adata[, filename, column, ...])	Generate Gene Expression Profile (GEP) from scRNA-seq annotations
`ranked_genes`(adata[, type, outpath, ...])	export marker genes for each cluster to .gct file
`pseudobulk`(adata[, outpath, column, label, ...])	export pseudobulk profiles of cells to .gct files

standardworkflow¶

`read_matrix`(root_path[, citeseq, ...])	Read matrix file as expected for the standard workflow.
`filtering_cells_genes_min`(adata, ...)
`filtering_mito_genes_max`(adata, ...)
`export_cp10k`(adata, basepath)	Export raw cp10k to FAIR format for loading into database
`export_regressedOut`(adata, basepath)	Export regressedOut to FAIR format for loading into database
`export_clustering`(adata, basepath, method)	Export cluster to cell mapping to FAIR format for loading into database
`export_metadata`(adata, basepath[, n_pcs, ...])	Export metadata in FAIR format for loading into database
`export_rank`(adata, basepath[, type, ...])	Export ranked genes to FAIR format for loading into database
`export_celltype`(adata, basepath)	Export celltype annotation to cell mapping in FAIR format for loading into database
`additional_labeling`(adata, labeling_to_use, ...)	Standard Workflow function to export an additional labeling besides louvain to FAIR format.
`celltype_labeling`(adata, labeling_author, ...)	Standard Workflow function to export an additional labeling besides louvain to FAIR format.