Skip to content

Commit

Permalink
Updated README.md with a FAQ section
Browse files Browse the repository at this point in the history
  • Loading branch information
nsapoval authored Sep 13, 2024
1 parent f9e4f13 commit 4f26206
Showing 1 changed file with 29 additions and 0 deletions.
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,35 @@ lemur -i examples/example-data/example.fastq \

The output in the `example-output` folder will consist of raw `relative_abundance.tsv` file with taxonomic IDs, lineage information, and inferred relative abundance (`F` column). There will also be a `relative_abundance-[rank].tsv` where the rank is specified by the `-r/--rank` flag (e.g. in the above example it will be `species`). The `*P_rgs_df*` files capture individual inferred probabilities of a given read comign from a particular taxon.

---

### FAQ

**Issue:** I run my analysis on a long-read metagenome, but it crashes with the following error:
```
Traceback (most recent call last):
File "/Users/nsapoval/miniconda3/envs/lemur-test-env/bin/lemur", line 901, in <module>
main()
File "/Users/nsapoval/miniconda3/envs/lemur-test-env/bin/lemur", line 887, in main
run.EM_complete()
File "/Users/nsapoval/miniconda3/envs/lemur-test-env/bin/lemur", line 672, in EM_complete
self.low_abundance_threshold = 1. / n_reads
~~~^~~~~~~~~
ZeroDivisionError: float division by zero
```

**Solutions:** Most likely this happens due to the filtering step which be default removes all alignments shorter than 75% of the corresponding marker gene length (see `--min-aln-len-ratio` flag description in the section below).

1. Produce a histogram of read lengths in your FASTQ file if there is a significant portion of the sample of length below 400-500 bps, it is very likely that the above filter removes all alignments.
2. In the output folder, you can find a file called `P_rgs_df_raw.tsv`. It contains raw information about the alignments prior to the above filters. Verify the `aln_len` column of this file, if you see all values below 200-300 bps it means that there are no long alignments to marker genes.
3. If either of the above holds true, the analysis results might be unreliable. However, if you wish to proceed, you can add the `--min-aln-len-ratio 0.10` flag to the run retaining all alignments of length >=10% of the target marker gene length.

---

If you discover any additional issues while running the tool, please use [GitHub Issues](https://github.com/treangenlab/lemur/issues) interface to report it. Common issues and solution will be added to this FAQ.

---

### Parameter descriptions

Main arguments:
Expand Down

0 comments on commit 4f26206

Please sign in to comment.