adding new functions to besca¶

We look forward to having all of your useful python functions/scripts for scRNA seq analysis included in besca. Since most of the documentation for this package is generated automatically this documentation will walk you through properly annotating your function for inclusion in besca. You will also find information on how to include example output from your function or additional text.

finding the right place for your function within besca¶

The besca package has been divided into three main submodules and additional supporting modules. The core subpackages contain useful functions for scRNA seq analysis, the import/export submodules contain functions to read and write data using our FAIR file formats, and the standard workflow subpackage only contains functions that have been specifically optimized for use in our standard processing pipeline.

Depending on what your function does you will need to include it in the correct subarea of besca. The graphic outline of the besca package above gives you a rough idea of where it might make sense to include your package. If in doubt, discuss with your colleagues.

how to include your function in your chosen location¶

make a new python file (ending in .py) which carries a descriptive name describing what the functions contained within do. Please ensure that the name of the file starts with an underscore so that it is not automatically imported into the module (so for example _my_functions.py). Alternatively, you can also append your function to an existing file if this makes more sense (see here).
Write and document your function (see sections below for more details). Please try and adhere to python best coding practices for writing a function and also ensure that your function ends with a return() statement.
Import the function into the __init__.py file of the subpackage under the name you would like it to show up under in besca. This will make your function available within besca.
Add the function to the __all__ statement in the __init__.py file so that its documentation will automatically be added.

__init__.py of plotting module¶

from besca.pl._filter_threshold_plots import (
    kp_genes,
    kp_counts,
    kp_cells,
    max_counts,
    max_genes,
    max_mito,
)
from besca.pl._split_gene_expression import gene_expr_split, gene_expr_split_stacked
from besca.pl._celltype_quantification import (
    celllabel_quant_boxplot,
    celllabel_quant_stackedbar,
)
from besca.pl._qc_plots import (
    dropouts,
    librarysize_overview,
    detected_genes,
    library_size,
    transcript_capture_efficiency,
    top_genes_counts,
)
from besca.pl._general import stacked_split_violin, box_per_ind, flex_dotplot
from besca.pl._dot_heatmap import dot_heatmap, dot_heatmap_split, dot_heatmap_split_greyscale
from besca.pl._update_palette import update_qualitative_palette
from besca.pl._nomenclature_network import nomenclature_network
from besca.pl._riverplot import riverplot_2categories

__all__ = [
    "kp_genes",
    "kp_counts",
    "kp_cells",
    "max_counts",
    "max_genes",
    "max_mito",
    "dropouts",
    "detected_genes",
    "library_size",
    "librarysize_overview",
    "transcript_capture_efficiency",
    "top_genes_counts",
    "gene_expr_split",
    "gene_expr_split_stacked",
    "box_per_ind",
    "stacked_split_violin",
    "celllabel_quant_boxplot",
    "celllabel_quant_stackedbar",
    "dot_heatmap",
    "dot_heatmap_split",
    "dot_heatmap_split_greyscale",
    "update_qualitative_palette",
    "nomenclature_network",
    "riverplot_2categories",
    "flex_dotplot",
]

Note

in case you are making a new submodule, please use the existing submodules e.g. the rc submodule of the tl package as an example for the correct structure

documenting your function¶

As with all shared resources, documenting your work is essential. Please always ensure to document any functions that you add to besca, so that others understand what the function does and how they can use it. Ideally, if applicable (e.g. by plotting functions) you would also include a simple example of your function that demonstrates how it works. Also please ensure to document the code you add to besca as much as possible so that others can understand your work and help in fixing any bugs that might crop up. This section will give you a brief overview of how to include function documentation in besca. For the actual contents of the documentation please use best common practices.

automatically generated documentation using DocStrings¶

Most of the package documentation is generated automatically using DocStrings that are included in the source code. This makes the documentation process easier since annotations that are already included in the source code can easily be used as a basis for the documentation. Also this keeps everything nicely in one place.

Here is the source code for an example function with the relevant DocStrings

def function_name(
    param1="default_value1", param2="default_value2", param3="default_value3"
):
    """one-line function description that shows up in summaries.

    more extensive multi line function description explaining exactly what the function
    does and is intended for examples for code execution of the function can also be
    provided here

    Parameters
    ----------
    param1: `type` | default = default_value1
        brief description of what param1 controls and to what it should be set
    param2: `type` | default = default_value2
        brief description of what param1 controls and to what it should be set
    param3: `type` | default = default_value3
        brief description of what param1 controls and to what it should be set

    Returns
    -------
    Type
        Information on what the function returns

    Example
    -------

    >>> #insert example code here
    >>> 1 + 1
    2
    
    # this code is only displayed not executed

    """

    # function body
    # do something here

This will result in an automatically generated documentation that looks like this:

besca.examples.example_function.function_name(param1='default_value1', param2='default_value2', param3='default_value3')[source]¶

one-line function description that shows up in summaries.

more extensive multi line function description explaining exactly what the function does and is intended for examples for code execution of the function can also be provided here

Parameters:

param1 (type | default = default_value1) – brief description of what param1 controls and to what it should be set
param2 (type | default = default_value2) – brief description of what param1 controls and to what it should be set
param3 (type | default = default_value3) – brief description of what param1 controls and to what it should be set

Returns:

Information on what the function returns

Return type:

Type

Example

>>> #insert example code here
>>> 1 + 1
2

# this code is only displayed not executed

The code that is displayed under the heading “Example” will only be displayed as code (with correct syntax highlighting), but it will not be executed. To include codeoutput as an example please see including example code output in documentation.

Note

reStructured Text is white space sensitive and highly dependent on correct formating. Please especially pay attention to the following:

always use spaces instead of tabs to indent (in most text editors this can be set as the default)

use unix end of line formating not windows

ensure that you have a blank line at the end of the DocString and a blank line after each paragraph (otherwise the displayed text will be indented)

For more information on DocStrings please refer here. We use the extension numpydoc to generate our docstrings since they are also nicely readable in their raw format.

You can find a primer on using reStructured text here

including example code output in documentation¶

It is very simple to include an example plot in the function documentation. Below the Example header in the docstring you can add the plot directive as outlined below followed by the code needed to generate the plot.

"""
...

Example
-------

Description of your example.

>>> # this is code that will be displayed but not executed
>>> # it should be a duplicate of the code used to generate the plot 
>>> #  people will not be able to see how you generated the plot
>>> ## plotting code 1
>>> ## plotting code 2

.. plot:: 

    >>> # this is code that will be displayed but not executed
    >>> # it should be a duplicate of the code used to generate the plot 
    >>> #  people will not be able to see how you generated the plot
    >>> ## plotting code 1
    >>> ## plotting code 2
    
"""

generating more in-depth code examples¶

Besca’s documentation includes a code gallery generated using the sphinx gallery extension. This extension gives you the possibility to include longer code examples in besca that can be downloaded by the user as a jupyter notebook. This is the ideal place to document a set of functions you have added to besca that were intended to perform a certain workflow. It is also a good place to show plotting functions.

All of the code that is added to the gallery will be executed each time besca’s documentation is built. This means that it is essential that the code is functional without any errors. This also makes these workflows a good sanity check for new versions of besca since any arising errors will come up during the build of the documentation.

The gallery has been subdivided into 4 sections:

plotting
preprocessing
tools
workflows

Sections 1-3 are intended to more extensively document functions contained in these besca submodules. The fourth section workflows is intended for longer tutorial style examples outlining a certain process within single-cell analysis using besca.

All of the code necessary to generate gallery examples is contained within /besca/besca/examples/gallery_examples. Each subfolder of this folder that contains a README.txt denotes a subsection of the gallery. The text contained within the README.txt will be rendered in the documentation above any examples contained within that folder.

creating a gallery example¶

create a new python document called plot_NAME.py in the correct subfolder of /besca/besca/examples/gallery_examples. It is essential that this document begins with plot as otherwise the code contained will not be executed.
Begin your document with a DocString sections outlining what the example will illustrate (the DocString begings with """ and ends with """). You should include a title which is underlined with equal signs (this will ensure that your example turns up in the gallery table of contents) and a brief description.
write your code as you would in a regular jupyter notebook. You can include additional text explaining steps or sub-headers by using the syntax outlined in the example gallery script.

Your example will now be included in the next build of the documentation. Plots and standardOut output will be shown at the top of the document.

example gallery script

"""
Title of example
================

Here goes your description of the example
"""

# start writing some code

###############################################################################
# sub heading
# -----------
#
# Here you add additional text you want to include
# there is no need to finish this section with any specific syntax
# just leave a line empty after this comment

# more code

###############################################################################
# this is an additional text comment without a heading
# this is more of that comment

# more code

best coding practices¶

error messages¶

If you wish a function to exit because a condition is not fullfilled and throw an error message please use the sys.exit() function. If you pass a 0 to the function then the system will interpret the function as having ended successfully. If you pass anything else, e.g. sys.exit('error message') it will interpret the function as having ended unsuccessfully. The text passed to the function will be returned as an error statement. Using this convention ensures that the system notifies the user of an occured error (if you simply use a print statement the user might overlook it) and stops the jupyter notebook from continueing running. In general it is good practice to include several checks in your function to ensure that the output is as it is intended to be.