Skip to content

AMR Prediction

martinghunt edited this page Apr 23, 2024 · 12 revisions

This page describes the use of mykrobe predict to make AMR predictions on the species supported by mykrobe - at the time of writing these are Mycobacterium tuberculosis, Staphylococcus aureus, Shigella sonnei, Salmonella enterica serotype Paratyphi B, and Salmonella Typhi. You can see the available panels by running:

mykrobe panels describe

If no panels are installed, then run:

mykrobe panels update_metadata
mykrobe panels update_species all

When running on Shigella sonnei samples, the output should be post-processed as described at https://github.com/katholt/sonneityping. Details of the S. sonnei genotyping scheme are available in the paper Hawkey et al, 2021, Nature Communications.

Data files

Please note that the first time you run mykrobe predict on each species, new files are created in mykrobe/data/skeletons/. This means that if you are running more than one sample in parallel, you need to run one sample first to generate those files. Otherwise, it will crash on some of the samples.

Examples of basic usage

Run on a Mycobacterium tuberculosis sample with one FASTQ file as input, writing the results to a comma-delimited file:

mykrobe predict --species tb --sample sample_name --seq reads.fq --output out.csv

Replace sample_name with the name of your sample - whatever is used here will appear in the output. Replace tb with staph or sonnei for Staphylococcus aureus or Shigella sonnei samples.

As above, but the input is two gzipped FASTQ files:

mykrobe predict -S tb -s sample_name -i reads_1.fq.gz reads_2.fq.gz -o out.csv

As above, but the input is a BAM file:

mykrobe predict -S tb -s sample_name -i reads.bam -o out.csv

The default output format is CSV, and contains the essential information on the lineage of the sample (Mtb and sonnei only), and the AMR calls. For more detailed output, instead make a JSON file:

mykrobe predict -S tb -s sample_name -i reads.fq --format json -o out.json

Make both a JSON and a CSV file, called out.json and out.csv:

mykrobe predict -S tb -s sample_name -i reads.fq --format json_and_csv -o out

Important options

Nanopore data

By default, the assumption is that the input reads are Illumina. If instead, you have nanopore data, then use the option --ont. Example:

mykrobe predict -S tb -s sample_name --ont -i nanopore_reads.fq -o out.csv

Minor resistance calls

By default, if a variant call is identified where a significant minority (enough to trigger a heterozygous call) of the reads have the variant, then it triggers a resistance call, reported as a lowercase "r" in the output. An uppercase "R" is used for a normal resistance call where the majority of reads have the variant (homozygous call). Use the option --ignore_minor_calls to ignore these minor calls when predicting resistance. Example:

mykrobe predict -S tb -s sample_name --ignore_minor_calls -i reads.fq -o out.csv

Getting all call information

The default behaviour is to only report detailed call information when it is a non-reference call (and therefore causes a resistance call). For debugging, or other in-depth analysis, it can be useful to see all calls with the --report_all_calls option. If the output is in JSON format, this will add information for all calls in the panel into the output. Example:

mykrobe predict -S tb -s sample_name --report_all_calls -i reads.fq --format json -o out.json

Other options

The other options are more advanced, and we do not recommend using them unless you know what you are doing.

Output

Please see the AMR prediction output page for a description of the output.

Full usage

$ mykrobe predict --help
usage: mykrobe predict [-h] -s SAMPLE [-k kmer] [--tmp TMP] [--keep_tmp] [--skeleton_dir SKELETON_DIR] [-t THREADS] [-m MEMORY] [--expected_depth EXPECTED_DEPTH] [-1 seq [seq ...]]
                       [-c ctx] [-f] [--ont] [--guess_sequence_method] [--ignore_minor_calls] [--ignore_filtered IGNORE_FILTERED] [--model model] [--ploidy ploidy]
                       [--filters FILTERS [FILTERS ...]] [-A] [-e EXPECTED_ERROR_RATE] [--min_variant_conf MIN_VARIANT_CONF] [--min_gene_conf MIN_GENE_CONF]
                       [-D MIN_PROPORTION_EXPECTED_DEPTH] [--min_gene_percent_covg_threshold MIN_GENE_PERCENT_COVG_THRESHOLD] [-o OUTPUT] [--panels_dir DIRNAME] [-q] [-d] -S species
                       [--panel panel] [-P FILENAME] [-R FILENAME] [-L FILENAME] [--min_depth min_depth] [--conf_percent_cutoff conf_percent_cutoff] [-O {json,csv,json_and_csv}]

optional arguments:
  -h, --help            show this help message and exit
  -s SAMPLE, --sample SAMPLE
                        Sample identifier [REQUIRED]
  -k kmer, --kmer kmer  K-mer length (default: 21)
  --tmp TMP             Directory to write temporary files to
  --keep_tmp            Don't remove temporary files
  --skeleton_dir SKELETON_DIR
                        Directory for skeleton binaries
  -t THREADS, --threads THREADS
                        Number of threads to use
  -m MEMORY, --memory MEMORY
                        Memory to allocate for graph constuction (default: 1GB)
  --expected_depth EXPECTED_DEPTH
                        Expected depth
  -1 seq [seq ...], -i seq [seq ...], --seq seq [seq ...]
                        Sequence files (fasta,fastq,bam)
  -c ctx, --ctx ctx     Cortex graph binary
  -f, --force           Force override any skeleton files
  --ont                 Set defaults for ONT data. Sets `-e 0.08 --ploidy haploid`
  --guess_sequence_method
                        Guess if ONT or Illumia based on error rate. If error rate is > 10%, ploidy is set to haploid and a confidence threshold is used
  --ignore_minor_calls  Ignore minor calls when running resistance prediction
  --ignore_filtered IGNORE_FILTERED
                        Don't include filtered genotypes
  --model model         Genotype model used. Options kmer_count, median_depth (default: kmer_count)
  --ploidy ploidy       Use a diploid (includes 0/1 calls) or haploid genotyping model (default: diploid)
  --filters FILTERS [FILTERS ...]
                        Don't include specific filtered genotypes (default: ['MISSING_WT', 'LOW_PERCENT_COVERAGE', 'LOW_GT_CONF', 'LOW_TOTAL_DEPTH'])
  -A, --report_all_calls
                        Report all calls
  -e EXPECTED_ERROR_RATE, --expected_error_rate EXPECTED_ERROR_RATE
                        Expected sequencing error rate (default: 0.050)
  --min_variant_conf MIN_VARIANT_CONF
                        Minimum genotype confidence for variant genotyping (default: 150)
  --min_gene_conf MIN_GENE_CONF
                        Minimum genotype confidence for gene genotyping (default: 1)
  -D MIN_PROPORTION_EXPECTED_DEPTH, --min_proportion_expected_depth MIN_PROPORTION_EXPECTED_DEPTH
                        Minimum depth required on the sum of both alleles (default: 0.30)
  --min_gene_percent_covg_threshold MIN_GENE_PERCENT_COVG_THRESHOLD
                        All genes alleles found above this percent coverage will be reported (default: 100 (only best alleles reported))
  -o OUTPUT, --output OUTPUT
                        File path to save output file as. Default is to stdout
  --panels_dir DIRNAME  Name of directory that contains panel data (default: /Users/michaelhall/Projects/mykrobe/src/mykrobe/data)
  -q, --quiet           Only output warnings/errors to stderr
  -d, --debug           Output debugging information to stderr
  -S species, --species species
                        Species name, or 'custom' to use custom data, in which case --custom_probe_set_path is required. Run `mykrobe panels describe` to see list of options [REQUIRED]
  --panel panel         Name of panel to use. Ignored if species is 'custom'. Run `mykrobe panels describe` to see list of options
  -P FILENAME, --custom_probe_set_path FILENAME
                        Required if species is 'custom'. Ignored otherwise. File path to fasta file from `mykrobe make-probes`.
  -R FILENAME, --custom_variant_to_resistance_json FILENAME
                        For use with `--panel custom`. Ignored otherwise. File path to JSON with key,value pairs of variant names and induced drug resistance.
  -L FILENAME, --custom_lineage_json FILENAME
                        For use with `--panel custom`. Ignored otherwise. File path to JSON made by --lineage option of make-probes
  --min_depth min_depth
                        Minimum depth (default: 1)
  --conf_percent_cutoff conf_percent_cutoff
                        Number between 0 and 100. Determines --min_variant_conf, by simulating variants and choosing the cutoff that would keep x% of the variants (default: 100)
  -O {json,csv,json_and_csv}, --format {json,csv,json_and_csv}
                        Choose output format (default: csv)