fit¶

besca.tl.auto_annot.fit(adata_train, method, celltype, njobs=10, celltype_variable='dblabel', n_cells=5)[source]¶

fits classifier on training dataset

uses specified celltype column as label

Parameters:

adata_train (AnnData) – training dataset anndata object
method (string) – either ‘linear’ for a linar svm, ‘sgd’ for stochastic gradient descent, recommended when using raw, ‘rbf’ radial basis function, not recommended due to time it is also possible to choose a random forest classifier with ‘random_forest’, is faster than other methods but performs less well, or a multiclass logisitc regression using ‘logistic_regression’. this allows you, like the random forest to specify cutoffs and have cells classifier as unknowns. Recommended.
celltype (string) – column name of column to be used for classification
njobs (int) – number of cores to use, only applies to regression or random forest classifiers
celltype_variable (string | default: dblabel) – anndata object obs column header, which includes the celltypes
n_cells (int | default: 5) – minimum count of each celltype entries, to be used for fit() function

Returns:

sklearn.calibration.CalibratedClassifierCV or other classifier class – trained svm classifier
sklearn.preprocessing.StandardScaler – a scaler fitted on the training set to be used on testing set