Using MultiQC within scripts
Even though the primary way to run MultiQC is as a command line, it can also be imported like a Python module in order to build the report interactively, such as in custom Python scripts or in a Jupyter notebook environment (See an example notebook).
MultiQC provides a set of commands to iteratively parse logs and add sections to a report. All of them are available via importing MultiQC as a module:
import multiqc
Parse logs
Find files that MultiQC recognizes in analysis_dir and parse them, without generating a report.
Data can be accessed with other methods: list_modules, list_plots, etc.
def parse_logs(*analysis_dir, **kwargs)
Parameters:
analysis_dir: Path(s) to search for files to parseverbose: Print more information to the consolefile_list: Supply a file containing a list of file paths to be searched, one per rowprepend_dirs: Prepend directory to sample namesdirs_depth: Prepend n directories to sample names. Negative number to take from start of pathfn_clean_sample_names: Do not clean the sample names (leave as full file name)require_logs: Require all explicitly requested modules to have log files. If not, MultiQC will exit with an erroruse_filename_as_sample_name: Use the log filename as the sample namestrict: Don't catch exceptions, run additional code checks to help developmentquiet: Only show log warningsno_ansi: Disable coloured log outputprofile_runtime: Add analysis of how long MultiQC takes to run to the reportno_version_check: Disable checking the latest MultiQC version on the serverignore: Ignore analysis filesignore_samples: Ignore sample namesrun_modules: Use only this module. Can specify multiple timesexclude_modules: Do not use this module. Can specify multiple timesconfig_files: Specific config file to load, after those in MultiQC dir / home dir / working dirmodule_order: Names of modules in order of precedence to show in reportextra_fn_clean_exts: Extra file extensions to clean from sample namesextra_fn_clean_trim: Extra strings to clean from sample namespreserve_module_raw_data: Preserve raw data from modules in the report - besides plots. Useful to use later interactively. Defaults toTrue. Set toFalseto save memory.
Examples
Parse logs found in the data directory.
multiqc.parse_logs('data')
Parse logs found in the data/fastp directory, the data/SAMPLE1.cutadapt.log file,
and a data_mqc.tsv MultiQC custom content file.
multiqc.parse_logs('data/fastp', 'data/SAMPLE1.cutadapt.log', "data_mqc.tsv")
Parse logs found in the data directory for only the specified modules, and use
and additional pattern to clean sample names.
multiqc.parse_logs(
'data',
run_modules=["fastp", "spades", "quast", "pangolin"],
extra_fn_clean_exts=[".unclassified"],
)
Parse logs found in the data directory and run FastQC module twice for two sets of files - raw and trimmed reads - according to the provided path pattern (see Order of modules for details).
multiqc.parse_logs(
'data',
module_order=[
dict(
fastqc=dict(
name="FastQC (trimmed)",
anchor="fastqc_trimmed",
path_filters=["*_1_trimmed_fastqc.zip"],
)
),
dict(
quast=dict(
name="FastQC (raw)",
anchor="fastqc_raw",
path_filters=["*_1_fastqc.zip"],
)
),
],
)
MultiQC v1.29 and higher generates BETA-multiqc.parquet file in multiqc_data output directory. You can pass that file to parse_logs, and it will load that previous MultiQC run into memory.
Example:
multiqc.parse_logs('multiqc_data/BETA-multiqc.parquet')
List what's loaded
Return list of the modules that have been loaded, ordered according to config:
def list_modules() ‑> list[str]
Return dict of plot names that have been loaded, indexed by module name and section:
def list_plots() ‑> dict[str, list[str | dict[str, str]]]]
Example:
multiqc.list_plots()
{'fastp': ['Filtered Reads',
'Insert Sizes',
{'Sequence Quality': ['Read 1: Before filtering',
'Read 1: After filtering',
'Read 2: Before filtering',
'Read 2: After filtering']},
{'GC Content': ['Read 1: Before filtering',
'Read 1: After filtering',
'Read 2: Before filtering',
'Read 2: After filtering']},
{'N content': ['Read 1: Before filtering',
'Read 1: After filtering',
'Read 2: Before filtering',
'Read 2: After filtering']}]}
Return list of clean sample names that have loaded data:
def list_samples() ‑> list[str]
Example:
multiqc.list_samples()
['SAMPLE1_PE', 'SAMPLE2_PE']
Return list of found log files corresponding to the loaded data:
def list_data_sources() ‑> list[str]
Example:
multiqc.list_data_sources()
['data/SAMPLE1_PE.fastp.json', 'data/SAMPLE2_PE.fastp.json']
Access loaded data
There are several methods to access the data loaded by parse_logs.
Return parsed module data, indexed (if available) by data key, then by sample. Module is either the module name, or the anchor:
def get_module_data(module: str = None, sample: str = None, key: str = None) ‑> dict
The function takes data from report.saved_raw_data, which populated by self.write_data_file() calls in individual modules.
This data is not necessarily normalized, e.g. numbers can be strings or numbers, depending on the individual module behaviour.
Example:
> multiqc.get_module_data(module="fastp", sample="SAMPLE1_PE")["summary"]
{'fastp_version': '0.23.2',
'sequencing': 'paired end (301 cycles + 301 cycles)',
'before_filtering': {'total_reads': 55442,
'total_bases': 16571632,
'q20_bases': 16267224,
'q30_bases': 15853021,
'gc_content': 0.38526},
'after_filtering': {'total_reads': 48270,
'total_bases': 14363465,
'q20_bases': 14323363,
'q30_bases': 14199841,
'gc_content': 0.383991}}
Similarly, return parsed general stats data, indexed by sample, then by data key. If sample is specified, return only data for that sample.
def get_general_stats_data(sample: str = None) ‑> dict
Adding custom content
You can also custom section to the report by subclassing from multiqc.BaseMultiqcModule. This can be used to add a custom table or other content.
Example
Create a table (see plotting for more detail) and add it to the report.
import multiqc
from multiqc.plots import table
plot = table.plot(
data=...,
headers=...,
pconfig={
"id": "my_metrics_table",
"title": "My metrics",
},
)
module = multiqc.BaseMultiqcModule(
name="my-module",
anchor="custom_data",
)
module.add_section(
plot=plot,
name="My metrics",
anchor="my_metrics_section",
description=...,
)
multiqc.report.modules.append(module)
Get plot object
Get a plot object for a specific module and section. For list of available plots, use multiqc.list_plots.
def get_plot(module: str, section: str) -> Plot
Examples
Get plot object for the "GC Content" plot in the "fastp" module.
plot = multiqc.get_plot("fastp", "GC Content")
Get plot object for the "Number of Contigs" plot in the "QUAST" module.
plot = multiqc.get_plot("QUAST", "Number of Contigs")
Show plot
Prepare plot to be shown in the notebook cell.
class Plot:
def show(self, dataset_id: int | str = 0, flat=False, **kwargs)
Parameters:
dataset_id: Dataset label, in case if plot has several tabsflat: Show plot as static images without any interactivitykwargs: Additional arguments passed to the plot
Examples
Create a bar graph and show it in the notebook cell:
from multiqc.plots import bargraph
plot = bargraph.plot(...)
display(plot.show(violin=True))
Get "fastp GC Content" plot and show it in the notebook cell. Since it has multiple
tabs, we can select which tab to show with the dataset_id option (defaults to the first tab):
plot = multiqc.get_plot("fastp", "GC Content")
display(plot.show(dataset_id="Read 2: Before filtering"))
Shows Samtools alignment stats as a violin plot. Use flat image without interactivity.
plot = multiqc.get_plot("Flagstat", "Alignment stats")
display(plot.show("Read counts", violin=True, flat=True))
Calling the notebook's built-in display function is optional when the show call is the last line in your cell.
Save plot to file
Similarly, you can save plot to a file instead of showing it in a notebook.
class Plot:
def save(self, filename, dataset_id: int | str = 0, **kwargs)
Parameters:
filename: Path to save the plotdataset_id: Dataset label, in case if plot has several tabskwargs: Additional arguments passed to the plot
Examples:
Save the "Number of Contigs" plot for the QUAST module to a file.
plot = multiqc.get_plot("QUAST", "Number of Contigs")
plot.save("quast_contigs.html")
Save the GC Content plot for the dataset labeled "Read 2: Before filtering" to a file, make it flat.
plot = multiqc.get_plot("fastp", "GC Content")
plot.save(
"fastp_gc_content.png",
dataset_id="Read 2: Before filtering",
)
Save Samtools alignment stats as a violin plot to a file.
plot = multiqc.get_plot("Flagstat", "Alignment stats")
plot.save(
"flagstat_alignment_stats.html",
dataset="Read counts",
violin=True,
)
Writing report
Render HTML from parsed module data, and write a report along with auxiliary data files to disk.
def write_report(**kwargs)
Parameters:
title: Report title. Printed as page header, used for filename if not otherwise specifiedreport_comment: Custom comment, will be printed at the top of the reporttemplate: Report template to useoutput_dir: Create report in the specified output directoryfilename: Report filename. Use 'stdout' to print to standard outmake_data_dir: Force the parsed data directory to be createddata_format: Output parsed data in a different formatzip_data_dir: Compress the data directoryforce: Overwrite existing report and data directorymake_report: Generate the report HTML. Defaults toTrue, set toFalseto only export data and plotsexport_plots: Export plots as static images in addition to the reportplots_force_flat: Use only flat plots (static images)plots_force_interactive: Use only interactive plots (in-browser Javascript)strict: Don't catch exceptions, run additional code checks to help developmentdevelopment: Development mode. Do not compress and minimise JS, export uncompressed plot datamake_pdf: Create PDF report. Requires Pandoc to be installedno_megaqc_upload: Don't upload generated report to MegaQC, even if MegaQC options are foundquiet: Only show log warningsverbose: Print more information to the consoleno_ansi: Disable coloured log outputprofile_runtime: Add analysis of how long MultiQC takes to run to the reportno_version_check: Disable checking the latest MultiQC version on the serverrun_modules: Use only these modulesexclude_modules: Do not use these modulesconfig_files: Specific config file to load, after those in MultiQC dir / home dir / working dircustom_css_files: Custom CSS files to include in the reportmodule_order: Names of modules in order of precedence to show in report
Example:
multiqc.write_report(
force=True,
output_dir="my_multiqc_report",
title="My Report",
filename="report.html",
)
Load config from file
Load config on top of the current config from a MultiQC config file.
def load_config(config_file: str | Path)
Reset session
Reset the report to start fresh. Drops all previously parsed and loaded data.
def reset()