make_gmtx¶
- besca.tl.sig.make_gmtx(setName, desc, User, Source, Subtype, domain, genesetname, genes, studyID=None, analysisID=None, application=None, celltype=None, coef_type='logFC')[source]¶
Construct a gmtx file according to format conventions for import into Gems. :param setName: informative set name e.g. Pembro_induced_MC38CD8Tcell, Plasma_mdb, TGFB_Stromal_i :type setName: str :param desc: informative and verbose signature description; for cell type signatures use nomenclature,
if coef used explain what it represents; link to study if present; e.g. Genes higher expressed in Pembro vs. vehicle in non-naive CD8-positive T cells in MC38 in vivo exp. ID time T2; coefs are log2FC
- Parameters:
User (str) – related to signature origin e.g. Public (for literature-derived sets), own user ID for analysis-derived sets, rtsquad, scsquad, gred, other
Source (str) – source of the signature, one of Literature scseq, Literature, besca, scseqmongodb, internal scseq, pRED, Chugai, gRED, other
Subtype (str) – specific subtype e.g. onc, all, healthy, disease
domain (str) – one of pathway, biological process, cellular component,molecular function, phenotype, perturbation, disease, misc, microRNA targets, transcription factor targets, cell marker, tissue marker
genesetname (str) – shared across different signatures of a specific type e.g. besca_marker, dblabel_marker, Pembro_induced_MC38CD8Tcell, FirstAuthorYearPublication
genes (str) – tab-separated list of genes with/without a coefficient e.g. Vim | 2.4 Bin1 | 2.02 or Vim Bin1
studyID (str | default = None) – study name as in scMongoDB/bescaviz; only when source=internal scseq
analysisID (str | default = None) – analysis name as in scMongoDB/bescaviz; only when source=internal scseq
application (str | default = None) – specify which application will read the geneset e.g. rtbeda_CIT, bescaviz, celltypeviz
celltype (str | default = None) – for cell markers, specify celltype according to dblabel_short convention to facilitate reuse
coef_type (str | default = score) – specify what the coefficient corresponds too, e.g. logFC, gini, SAM, score, …
- Returns:
a dictionary with populated fields needed to later export the signature to gmtx
- Return type:
geneset
Example
>>> import besca as bc >>> User = 'nouser' >>> Source = 'pbmc3k_processed' >>> Subtype = 'public' >>> domain = 'perturbation' >>> studyID = 'pbmc3k_processed' >>> analysisID = 'default' >>> genesetname = 'pbmc3k_processed_cluster0' >>> setName = 'pbmc3k_processed_cluster0' >>> desc = 'Genes higher expressed in cluster 0; coefs are log2FC' >>> adata = bc.datasets.simulated_pbmc3k_processed() >>> myfc = 1 >>> mypval = 0.05 >>> DEgenes=bc.tl.dge.get_de(adata,'leiden',demethod='wilcoxon',topnr=5000, logfc=myfc,padj=mypval) >>> pdout=DEgenes['0'].sort_values('Log2FC', ascending=False) >>> genes=" ".join(list(pdout['Name'].astype(str) + " | " + pdout['Log2FC'].round(2).astype(str))) >>> signature_dict = bc.tl.sig.make_gmtx(setName,desc,User,Source,Subtype,domain,genesetname,genes,studyID,analysisID) Prefered source names: Literature scseq, Literature, besca, scseqmongodb, internal scseq, pRED, Chugai, gRED, other Metadata for signature pbmc3k_processed_cluster0 successfully captured