BESCAPE (BESCA Proportion Estimator) is a deconvolution module. It utilises single-cell annotations coming from the BESCA workflow to build a Gene Expression Profile (GEP). This GEP is used as a basis vector to deconvolute bulk RNA samples i.e. predict cell type proportions within a sample.
BESCAPE has a useful implementation, whereby the user can specify their own GEP, as well as choose any of the supported deconvolution methods. Thus, it effectively allows decoupling of the deconvolution algorithm from its underlying GEP (basis vector).
This tutorial presents the workflow for deconvolution, as well as the link to BESCA single-cell annotations.
We assume that either Docker or Singularity services have already been installed.
Initiate the deconvolution predictor object. Requires either a Docker, or a Singularity image to run. Both methods are shown below.
To initiate the Bescape deconvolution object, we to set the service to 'docker' and docker_image='bedapub/bescape:version'. It will first look for local docker images, and if not available, will pull the bescape image from DockerHub. This also means that one can locally build a customised Docker image from the BESCAPE source and set use it in the Bescape object.
import os
from bescape import Bescape
# docker
# may take some time if the docker image is being built for the first time
deconv = Bescape(service='docker', docker_image='bedapub/bescape:0.5')
If running a permission error to run the docker image, please follow the steps in https://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo to run docker without sudo
Namely, Add the docker group if it doesn't already exist:
sudo groupadd docker
Add the connected user "$USER" to the docker group. Change the user name to match your preferred user if you do not want to use your current user:
sudo gpasswd -a $USER docker
Either do a newgrp docker or log out/in to activate the changes to groups.
When using Singularity, the user specifies the absolute path for the Singularity container file.
If the path is not given, Bescape will attempt to pull the lastest docker image from Dockerhub and build a new copy of a Singularity container file. In this case, the docker_image
parameter specifies which image is pulled from the DockerHub to be converted to a Singularity container.
import os
from bescape import Bescape
# singularity
deconv = Bescape(service='singularity', docker_image='bedapub/bescape:0.5', path_singularity=None)
Once the Bescape object has been initialised, the methods are the same for both docker
and singularity
. The module distinguishes between two types of basis vectors as input:
(Please note that when single file is expected either in the scRNASeq or bulkRNASeq, it will just grab the first alphabetical file in the respective folder)
The correct example file input structure is shown here: https://github.com/bedapub/bescape/tree/master/docs/datasets/bescape
The user needs to provide:
# Important to specify ABSOLUTE directory paths
wd = os.getcwd()
annot = wd + '/datasets/bescape/gep'
inpt = wd + '/datasets/bescape/input'
output = wd + '/datasets/bescape/output'
print(output)
# deconvolute using MuSiC - sc based basis vector
deconv.deconvolute_gep(dir_annot= annot,
dir_input= inpt,
dir_output= output,
method='bescape')
As bulk input EPIC takes in ExpressionSet with the @assayData
slot filled with gene expression count from each bulk sample. The counts should be given in TPM, RPKM or FPKM when using the prebuilt reference profiles.
If we leave dir_annot='epic'
, EPIC will provide a prebuilt reference profile that can predict: B cells, CAFs, CD4+ T cells, CD8+ T cells, NK, cells, and Macrophages.
# Important to specify ABSOLUTE directory paths
wd = os.getcwd()
annot = wd + '/datasets/epic/gep'
inpt = wd + '/datasets/epic/input'
output = wd + '/datasets/epic/output'
# deconvolute using MuSiC - sc based basis vector
deconv.deconvolute_gep(dir_annot= 'epic',
dir_input= inpt,
dir_output= output,
method='epic')
# Important to specify ABSOLUTE directory paths
wd = os.getcwd()
annot = wd + '/datasets/music/gep'
inpt = wd + '/datasets/music/input'
output = wd + '/datasets/music/output'
# deconvolute using MuSiC - sc based basis vector
deconv.deconvolute_sc(dir_annot= annot,
dir_input= inpt,
dir_output= output,
method='music')
Using SCDC requires additional parameters:
celltype_var
- variable name containing the cell type annot in @phenoData of the esetcelltypesel
- cell types of interest to estimatesamplevar
- variable name in @phenoData identifying the sample namewd = os.getcwd()
dir_annot = wd + '/datasets/scdc/gep/'
dir_input = wd + '/datasets/scdc/input'
dir_output = wd + '/datasets/scdc/output'
deconv.deconvolute_sc(dir_annot=dir_annot,
dir_input=dir_input,
dir_output=dir_output,
method='scdc',
celltype_var='cluster',
celltype_sel=["alpha","beta","delta","gamma","acinar","ductal"],
sample_var='sample')