Skip to content
Tyler Fair edited this page Oct 14, 2019 · 3 revisions

The discover module

  • fasta (required) - the input fasta file you'd like to discover target sequences in
  • database (required) - the database of off-target sequences for the genome of interest
  • output (required) - the output file
  • positionOutput (optional, default to false) - should we output positional information along with the off-target sequences? This can make really, really large files.
  • forceLinear (optional, default to false) - this forces FlashFry to perform a linear traversal instead of a precomputed bin traversal of the database. The only reason to use this is the case where you have a large number of guides (>10,000), in which case it saves the time it takes FlashFry to realize it needs to do a linear traversal anyway.
  • maxMismatch (optional, default to 4) - the mismatch threshold to consider for off-target discovery
  • flankingSequence (optional, default to 10) - how much sequence context to preserve up and downstream of the target. This sequence context is used by on-target metrics.
  • maximumOffTargets (optional, default to 2000) - the number of off-targets to store before marking a candidate guide with the "OVERFLOW" tag. Lower values here speed up search and keep memory requirements low, higher values do the opposite. I'd recommend keeping this at the default for initial searches, and only raising it if you don't get a rich enough candidate list or you're doing this for methods development. Note that if using the default maxMismatch value of 4, the majority of detected off-targets will contain 4-mismatches, and a significant number of candidate guides will have ≥2000 off-targets and consequently tagged "OVERFLOW."