adding new functions to besca

We look forward to having all of your useful python functions/scripts for scRNA seq analysis included in besca. Since most of the documentation for this package is generated automatically this documentation will walk you through properly annotating your function for inclusion in besca. You will also find information on how to include example output from your function or additional text.

finding the right place for your function within besca

The besca package has been divided into three main submodules and additional supporting modules. The core subpackages contain useful functions for scRNA seq analysis, the import/export submodules contain functions to read and write data using our FAIR file formats, and the standard workflow subpackage only contains functions that have been specifically optimized for use in our standard processing pipeline.

_images/besca_outline.jpg

Depending on what your function does you will need to include it in the correct subarea of besca. The graphic outline of the besca package above gives you a rough idea of where it might make sense to include your package. If in doubt, discuss with your colleagues.

how to include your function in your chosen location

  1. make a new python file (ending in .py) which carries a descriptive name describing what the functions contained within do. Please ensure that the name of the file starts with an underscore so that it is not automatically imported into the module (so for example _my_functions.py). Alternatively, you can also append your function to an existing file if this makes more sense (see here).

  2. Write and document your function (see sections below for more details). Please try and adhere to python best coding practices for writing a function and also ensure that your function ends with a return() statement.

  3. Import the function into the __init__.py file of the subpackage under the name you would like it to show up under in besca. This will make your function available within besca.

  4. Add the function to the __all__ statement in the __init__.py file so that its documentation will automatically be added.

__init__.py of plotting module
from besca.pl._filter_threshold_plots import (
    kp_genes,
    kp_counts,
    kp_cells,
    max_counts,
    max_genes,
    max_mito,
)
from besca.pl._split_gene_expression import gene_expr_split, gene_expr_split_stacked
from besca.pl._celltype_quantification import (
    celllabel_quant_boxplot,
    celllabel_quant_stackedbar,
)
from besca.pl._qc_plots import (
    dropouts,
    librarysize_overview,
    detected_genes,
    library_size,
    transcript_capture_efficiency,
    top_genes_counts,
)
from besca.pl._general import stacked_split_violin, box_per_ind, flex_dotplot
from besca.pl._dot_heatmap import dot_heatmap, dot_heatmap_split, dot_heatmap_split_greyscale
from besca.pl._update_palette import update_qualitative_palette
from besca.pl._nomenclature_network import nomenclature_network
from besca.pl._riverplot import riverplot_2categories

__all__ = [
    "kp_genes",
    "kp_counts",
    "kp_cells",
    "max_counts",
    "max_genes",
    "max_mito",
    "dropouts",
    "detected_genes",
    "library_size",
    "librarysize_overview",
    "transcript_capture_efficiency",
    "top_genes_counts",
    "gene_expr_split",
    "gene_expr_split_stacked",
    "box_per_ind",
    "stacked_split_violin",
    "celllabel_quant_boxplot",
    "celllabel_quant_stackedbar",
    "dot_heatmap",
    "dot_heatmap_split",
    "dot_heatmap_split_greyscale",
    "update_qualitative_palette",
    "nomenclature_network",
    "riverplot_2categories",
    "flex_dotplot",
]

Note

in case you are making a new submodule, please use the existing submodules e.g. the rc submodule of the tl package as an example for the correct structure

documenting your function

As with all shared resources, documenting your work is essential. Please always ensure to document any functions that you add to besca, so that others understand what the function does and how they can use it. Ideally, if applicable (e.g. by plotting functions) you would also include a simple example of your function that demonstrates how it works. Also please ensure to document the code you add to besca as much as possible so that others can understand your work and help in fixing any bugs that might crop up. This section will give you a brief overview of how to include function documentation in besca. For the actual contents of the documentation please use best common practices.

automatically generated documentation using DocStrings

Most of the package documentation is generated automatically using DocStrings that are included in the source code. This makes the documentation process easier since annotations that are already included in the source code can easily be used as a basis for the documentation. Also this keeps everything nicely in one place.

Here is the source code for an example function with the relevant DocStrings

def function_name(
    param1="default_value1", param2="default_value2", param3="default_value3"
):
    """one-line function description that shows up in summaries.

    more extensive multi line function description explaining exactly what the function
    does and is intended for examples for code execution of the function can also be
    provided here

    Parameters
    ----------
    param1: `type` | default = default_value1
        brief description of what param1 controls and to what it should be set
    param2: `type` | default = default_value2
        brief description of what param1 controls and to what it should be set
    param3: `type` | default = default_value3
        brief description of what param1 controls and to what it should be set

    Returns
    -------
    Type
        Information on what the function returns

    Example
    -------

    >>> #insert example code here
    >>> 1 + 1
    2
    
    # this code is only displayed not executed

    """

    # function body
    # do something here

This will result in an automatically generated documentation that looks like this:

besca.examples.example_function.function_name(param1='default_value1', param2='default_value2', param3='default_value3')[source]

one-line function description that shows up in summaries.

more extensive multi line function description explaining exactly what the function does and is intended for examples for code execution of the function can also be provided here

Parameters:
  • param1 (type | default = default_value1) – brief description of what param1 controls and to what it should be set

  • param2 (type | default = default_value2) – brief description of what param1 controls and to what it should be set

  • param3 (type | default = default_value3) – brief description of what param1 controls and to what it should be set

Returns:

Information on what the function returns

Return type:

Type

Example

>>> #insert example code here
>>> 1 + 1
2

# this code is only displayed not executed

The code that is displayed under the heading “Example” will only be displayed as code (with correct syntax highlighting), but it will not be executed. To include codeoutput as an example please see including example code output in documentation.

Note

reStructured Text is white space sensitive and highly dependent on correct formating. Please especially pay attention to the following:

  • always use spaces instead of tabs to indent (in most text editors this can be set as the default)

  • use unix end of line formating not windows

  • ensure that you have a blank line at the end of the DocString and a blank line after each paragraph (otherwise the displayed text will be indented)

For more information on DocStrings please refer here. We use the extension numpydoc to generate our docstrings since they are also nicely readable in their raw format.

You can find a primer on using reStructured text here

including example code output in documentation

It is very simple to include an example plot in the function documentation. Below the Example header in the docstring you can add the plot directive as outlined below followed by the code needed to generate the plot.

"""
...

Example
-------

Description of your example.

>>> # this is code that will be displayed but not executed
>>> # it should be a duplicate of the code used to generate the plot 
>>> #  people will not be able to see how you generated the plot
>>> ## plotting code 1
>>> ## plotting code 2

.. plot:: 

    >>> # this is code that will be displayed but not executed
    >>> # it should be a duplicate of the code used to generate the plot 
    >>> #  people will not be able to see how you generated the plot
    >>> ## plotting code 1
    >>> ## plotting code 2
    
"""

generating more in-depth code examples

Besca’s documentation includes a code gallery generated using the sphinx gallery extension. This extension gives you the possibility to include longer code examples in besca that can be downloaded by the user as a jupyter notebook. This is the ideal place to document a set of functions you have added to besca that were intended to perform a certain workflow. It is also a good place to show plotting functions.

All of the code that is added to the gallery will be executed each time besca’s documentation is built. This means that it is essential that the code is functional without any errors. This also makes these workflows a good sanity check for new versions of besca since any arising errors will come up during the build of the documentation.

The gallery has been subdivided into 4 sections:

  1. plotting

  2. preprocessing

  3. tools

  4. workflows

Sections 1-3 are intended to more extensively document functions contained in these besca submodules. The fourth section workflows is intended for longer tutorial style examples outlining a certain process within single-cell analysis using besca.

All of the code necessary to generate gallery examples is contained within /besca/besca/examples/gallery_examples. Each subfolder of this folder that contains a README.txt denotes a subsection of the gallery. The text contained within the README.txt will be rendered in the documentation above any examples contained within that folder.

best coding practices

error messages

If you wish a function to exit because a condition is not fullfilled and throw an error message please use the sys.exit() function. If you pass a 0 to the function then the system will interpret the function as having ended successfully. If you pass anything else, e.g. sys.exit('error message') it will interpret the function as having ended unsuccessfully. The text passed to the function will be returned as an error statement. Using this convention ensures that the system notifies the user of an occured error (if you simply use a print statement the user might overlook it) and stops the jupyter notebook from continueing running. In general it is good practice to include several checks in your function to ensure that the output is as it is intended to be.