fit

besca.tl.auto_annot.fit(adata_train, method, celltype, njobs=10, celltype_variable='dblabel', n_cells=5)[source]

fits classifier on training dataset

uses specified celltype column as label

Parameters:
  • adata_train (AnnData) – training dataset anndata object

  • method (string) – either ‘linear’ for a linar svm, ‘sgd’ for stochastic gradient descent, recommended when using raw, ‘rbf’ radial basis function, not recommended due to time it is also possible to choose a random forest classifier with ‘random_forest’, is faster than other methods but performs less well, or a multiclass logisitc regression using ‘logistic_regression’. this allows you, like the random forest to specify cutoffs and have cells classifier as unknowns. Recommended.

  • celltype (string) – column name of column to be used for classification

  • njobs (int) – number of cores to use, only applies to regression or random forest classifiers

  • celltype_variable (string | default: dblabel) – anndata object obs column header, which includes the celltypes

  • n_cells (int | default: 5) – minimum count of each celltype entries, to be used for fit() function

Returns:

  • sklearn.calibration.CalibratedClassifierCV or other classifier class – trained svm classifier

  • sklearn.preprocessing.StandardScaler – a scaler fitted on the training set to be used on testing set