recluster¶
- besca.tl.rc.recluster(adata, celltype, celltype_label='leiden', min_mean=0.0125, max_mean=4, min_disp=0.5, resolution=1.0, regress_out_key=None, random_seed=0, show_plot_filter=False, method='leiden', batch_key=None, n_shared=2)[source]¶
Perform subclustering on specific celltype to identify subclusters.
Extract all cells that belong to the pre-labeled celltype into a new data subset. This datasubset is initialized with the raw data contained in adata.raw. New highly variable genes are selected and a new clustering is performed. The function returns the adata subset with the new clustering annotation.
This can be performed on leiden clusters by setting celltype_label = ‘leiden’ and passing the clusters that are to be selected for reclustering as strings or tuple of strings to the parameter celltype.
- Parameters:
adata – the complete AnnData object of the Dataset.
celltype (str or (str)) – string identifying the cluster which is to be filtered out, if more than one is to be selected please pass them as a tuple not as a list!
celltype_label (str | default = ‘leiden’) – string identifying which column in adata.obs will be matching with the celltype argument.
min_mean (float | default = 0.0125) – the minimum gene expression a gene must have to be considered highly variable
max_mean (float | default = 4) – the maximum gene expression a gene can have to be considered highly variable
min_disp (float | default = 0.5) – the minimum dispersion a gene must have to be considered highly variable
regress_out_key (list of str | default = None) – A list of string identifiers of the adata.obs columns that should be regressed out before performing clustering. If None then no regress_out is calculated.
random_seed (int | default = 0) – the random seed that is used to produce reproducible PCA, clustering and UMAP results
show_plot_filter (bool | default = False) – boolian value indicating if a plot showing the filtering results for highly variable gene detection should be displayed or not
method (str | default = ‘leiden’) – clustering method to use for the reclustering of the datasubset. Possible:louvain/leiden
batch_key (str | default = None) – Specify a batch key if the HVG calculation should be done per batch
n_share (int | default = 3) – Divide the nr. of batched by this nr. to get the shared HVGs considered (e.g. >=1/3 of samples)
- Returns:
AnnData object containing the subcluster annotated with PCA, nearest neighbors, louvain cluster,
and UMAP coordinates.
Examples
For a more detailed example of the entire reclustering process please refer to the code examples.
>>> import besca as bc >>> import scanpy as sc >>> adata = bc.datasets.simulated_pbmc3k_processed() >>> adata_subset = bc.tl.rc.recluster(adata, celltype=('0', '1', '3', '6'), resolution = 1.3) >>> sc.pl.umap(adata_subset, color = ['leiden', 'Gene_4', 'Gene_5', 'Gene_6', 'Gene_10', 'Gene_12', 'Gene_20'])