recluster

besca.tl.rc.recluster(adata, celltype, celltype_label='leiden', min_mean=0.0125, max_mean=4, min_disp=0.5, resolution=1.0, regress_out_key=None, random_seed=0, show_plot_filter=False, method='leiden', batch_key=None, n_shared=2)[source]

Perform subclustering on specific celltype to identify subclusters.

Extract all cells that belong to the pre-labeled celltype into a new data subset. This datasubset is initialized with the raw data contained in adata.raw. New highly variable genes are selected and a new clustering is performed. The function returns the adata subset with the new clustering annotation.

This can be performed on leiden clusters by setting celltype_label = ‘leiden’ and passing the clusters that are to be selected for reclustering as strings or tuple of strings to the parameter celltype.

Parameters:
  • adata – the complete AnnData object of the Dataset.

  • celltype (str or (str)) – string identifying the cluster which is to be filtered out, if more than one is to be selected please pass them as a tuple not as a list!

  • celltype_label (str | default = ‘leiden’) – string identifying which column in adata.obs will be matching with the celltype argument.

  • min_mean (float | default = 0.0125) – the minimum gene expression a gene must have to be considered highly variable

  • max_mean (float | default = 4) – the maximum gene expression a gene can have to be considered highly variable

  • min_disp (float | default = 0.5) – the minimum dispersion a gene must have to be considered highly variable

  • regress_out_key (list of str | default = None) – A list of string identifiers of the adata.obs columns that should be regressed out before performing clustering. If None then no regress_out is calculated.

  • random_seed (int | default = 0) – the random seed that is used to produce reproducible PCA, clustering and UMAP results

  • show_plot_filter (bool | default = False) – boolian value indicating if a plot showing the filtering results for highly variable gene detection should be displayed or not

  • method (str | default = ‘leiden’) – clustering method to use for the reclustering of the datasubset. Possible:louvain/leiden

  • batch_key (str | default = None) – Specify a batch key if the HVG calculation should be done per batch

  • n_share (int | default = 3) – Divide the nr. of batched by this nr. to get the shared HVGs considered (e.g. >=1/3 of samples)

Returns:

  • AnnData object containing the subcluster annotated with PCA, nearest neighbors, louvain cluster,

  • and UMAP coordinates.

Examples

For a more detailed example of the entire reclustering process please refer to the code examples.

>>> import besca as bc
>>> import scanpy as sc
>>> adata = bc.datasets.simulated_pbmc3k_processed()
>>> adata_subset = bc.tl.rc.recluster(adata, celltype=('0', '1', '3', '6'), resolution = 1.3)
>>> sc.pl.umap(adata_subset, color = ['leiden', 'Gene_4', 'Gene_5', 'Gene_6', 'Gene_10', 'Gene_12', 'Gene_20'])