BESCAPE - tutorial on deconvolution of bulk RNA using single-cell annotations

BESCAPE (BESCA Proportion Estimator) is a deconvolution module. It utilises single-cell annotations coming from the BESCA workflow to build a Gene Expression Profile (GEP). This GEP is used as a basis vector to deconvolute bulk RNA samples i.e. predict cell type proportions within a sample.

BESCAPE has a useful implementation, whereby the user can specify their own GEP, as well as choose any of the supported deconvolution methods. Thus, it effectively allows decoupling of the deconvolution algorithm from its underlying GEP (basis vector).

This tutorial presents the workflow for deconvolution, as well as the link to BESCA single-cell annotations.

We assume that either Docker or Singularity services have already been installed.

Initialising the predictor object

Initiate the deconvolution predictor object. Requires either a Docker, or a Singularity image to run. Both methods are shown below.

1. Docker

To initiate the Bescape deconvolution object, we to set the service to 'docker' and docker_image='bedapub/bescape:version'. It will first look for local docker images, and if not available, will pull the bescape image from DockerHub. This also means that one can locally build a customised Docker image from the BESCAPE source and set use it in the Bescape object.

In [ ]:
import os
from bescape import Bescape

# docker
# may take some time if the docker image is being built for the first time
deconv = Bescape(service='docker', docker_image='bedapub/bescape:0.5')

Troubleshooting Docker permission error

If running a permission error to run the docker image, please follow the steps in https://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo to run docker without sudo

Namely, Add the docker group if it doesn't already exist:

sudo groupadd docker

Add the connected user "$USER" to the docker group. Change the user name to match your preferred user if you do not want to use your current user:

sudo gpasswd -a $USER docker

Either do a newgrp docker or log out/in to activate the changes to groups.

2. Singularity

When using Singularity, the user specifies the absolute path for the Singularity container file.

If the path is not given, Bescape will attempt to pull the lastest docker image from Dockerhub and build a new copy of a Singularity container file. In this case, the docker_image parameter specifies which image is pulled from the DockerHub to be converted to a Singularity container.

In [ ]:
import os
from bescape import Bescape

# singularity
deconv = Bescape(service='singularity', docker_image='bedapub/bescape:0.5', path_singularity=None)

Performing Deconvolution

Once the Bescape object has been initialised, the methods are the same for both docker and singularity. The module distinguishes between two types of basis vectors as input:
(Please note that when single file is expected either in the scRNASeq or bulkRNASeq, it will just grab the first alphabetical file in the respective folder)

Input file structure

The correct example file input structure is shown here: https://github.com/bedapub/bescape/tree/master/docs/datasets/bescape

The user needs to provide:

  1. Absolute path to the input FOLDER containing the input.csv file and the bulk.csv file (rows= bulk gene expression, columns=samples)
  2. Absolute path to the gep FOLDER containing the GEP file to be used as a basis vector for deconvolution

1. Gene Expression Profile (GEP)

  • generated from single-cell annotations using BESCA.export.generate_gep function
  • currently supported packages:
    1. bescape - in-house method based on nu-SVR (CIBERSORT)
  • implemented in the Bescape.deconvolute_gep( ) method

1.1. method = Bescape

In [ ]:
# Important to specify ABSOLUTE directory paths
wd = os.getcwd()
annot = wd + '/datasets/bescape/gep'
inpt = wd + '/datasets/bescape/input'
output = wd + '/datasets/bescape/output'

print(output)
# deconvolute using MuSiC - sc based basis vector
deconv.deconvolute_gep(dir_annot= annot, 
                      dir_input= inpt,
                      dir_output= output, 
                      method='bescape')

1.2. method = EPIC

As bulk input EPIC takes in ExpressionSet with the @assayData slot filled with gene expression count from each bulk sample. The counts should be given in TPM, RPKM or FPKM when using the prebuilt reference profiles.

If we leave dir_annot='epic', EPIC will provide a prebuilt reference profile that can predict: B cells, CAFs, CD4+ T cells, CD8+ T cells, NK, cells, and Macrophages.

In [ ]:
# Important to specify ABSOLUTE directory paths
wd = os.getcwd()
annot = wd + '/datasets/epic/gep'
inpt = wd + '/datasets/epic/input'
output = wd + '/datasets/epic/output'

# deconvolute using MuSiC - sc based basis vector
deconv.deconvolute_gep(dir_annot= 'epic', 
                      dir_input= inpt,
                      dir_output= output, 
                      method='epic')

2. Single-cell annotation AnnData object

  • should contain single-cell annotations of multiple samples from which the deconvolution method generates its own GEP
  • currently supported packages:
    1. MuSiC
    2. SCDC
  • The packages above are written in R. Thus, we need to convert the AnnData objects to R ExpressionSet objects. This has been semi-automated in the following notebook: Converting AnnData to Eset
  • implemented in the Bescape.deconvolute_sc( ) method

2.1. MuSiC

In [ ]:
# Important to specify ABSOLUTE directory paths
wd = os.getcwd()
annot = wd + '/datasets/music/gep'
inpt = wd + '/datasets/music/input'
output = wd + '/datasets/music/output'

# deconvolute using MuSiC - sc based basis vector
deconv.deconvolute_sc(dir_annot= annot, 
                      dir_input= inpt,
                      dir_output= output, 
                      method='music')

2.2. SCDC

Using SCDC requires additional parameters:

  • celltype_var - variable name containing the cell type annot in @phenoData of the eset
  • celltypesel - cell types of interest to estimate
  • samplevar - variable name in @phenoData identifying the sample name
In [ ]:
wd = os.getcwd()
dir_annot = wd + '/datasets/scdc/gep/'
dir_input = wd + '/datasets/scdc/input'
dir_output = wd + '/datasets/scdc/output'

deconv.deconvolute_sc(dir_annot=dir_annot, 
                      dir_input=dir_input,
                      dir_output=dir_output, 
                      method='scdc', 
                      celltype_var='cluster', 
                      celltype_sel=["alpha","beta","delta","gamma","acinar","ductal"], 
                      sample_var='sample')
In [ ]: