make_gmtx¶

besca.tl.sig.make_gmtx(setName, desc, User, Source, Subtype, domain, genesetname, genes, studyID=None, analysisID=None, application=None, celltype=None, coef_type='logFC')[source]¶

Construct a gmtx file according to format conventions for import into Gems. :param setName: informative set name e.g. Pembro_induced_MC38CD8Tcell, Plasma_mdb, TGFB_Stromal_i :type setName: str :param desc: informative and verbose signature description; for cell type signatures use nomenclature,

if coef used explain what it represents; link to study if present; e.g. Genes higher expressed in Pembro vs. vehicle in non-naive CD8-positive T cells in MC38 in vivo exp. ID time T2; coefs are log2FC

Parameters:

User (str) – related to signature origin e.g. Public (for literature-derived sets), own user ID for analysis-derived sets, rtsquad, scsquad, gred, other
Source (str) – source of the signature, one of Literature scseq, Literature, besca, scseqmongodb, internal scseq, pRED, Chugai, gRED, other
Subtype (str) – specific subtype e.g. onc, all, healthy, disease
domain (str) – one of pathway, biological process, cellular component,molecular function, phenotype, perturbation, disease, misc, microRNA targets, transcription factor targets, cell marker, tissue marker
genesetname (str) – shared across different signatures of a specific type e.g. besca_marker, dblabel_marker, Pembro_induced_MC38CD8Tcell, FirstAuthorYearPublication
genes (str) – tab-separated list of genes with/without a coefficient e.g. Vim | 2.4 Bin1 | 2.02 or Vim Bin1
studyID (str | default = None) – study name as in scMongoDB/bescaviz; only when source=internal scseq
analysisID (str | default = None) – analysis name as in scMongoDB/bescaviz; only when source=internal scseq
application (str | default = None) – specify which application will read the geneset e.g. rtbeda_CIT, bescaviz, celltypeviz
celltype (str | default = None) – for cell markers, specify celltype according to dblabel_short convention to facilitate reuse
coef_type (str | default = score) – specify what the coefficient corresponds too, e.g. logFC, gini, SAM, score, …

Returns:

a dictionary with populated fields needed to later export the signature to gmtx

Return type:

geneset

Example

>>> import besca as bc
>>> User = 'nouser'
>>> Source = 'pbmc3k_processed'
>>> Subtype = 'public'
>>> domain = 'perturbation'
>>> studyID = 'pbmc3k_processed'
>>> analysisID = 'default'
>>> genesetname = 'pbmc3k_processed_cluster0'
>>> setName = 'pbmc3k_processed_cluster0'
>>> desc = 'Genes higher expressed in cluster 0; coefs are log2FC'
>>> adata = bc.datasets.simulated_pbmc3k_processed()
>>> myfc = 1
>>> mypval = 0.05
>>> DEgenes=bc.tl.dge.get_de(adata,'leiden',demethod='wilcoxon',topnr=5000, logfc=myfc,padj=mypval)
>>> pdout=DEgenes['0'].sort_values('Log2FC', ascending=False)
>>> genes=" ".join(list(pdout['Name'].astype(str) + " | " + pdout['Log2FC'].round(2).astype(str)))
>>> signature_dict = bc.tl.sig.make_gmtx(setName,desc,User,Source,Subtype,domain,genesetname,genes,studyID,analysisID)
Prefered source names: Literature scseq, Literature, besca, scseqmongodb, internal scseq, pRED, Chugai, gRED, other
Metadata for signature pbmc3k_processed_cluster0 successfully captured