-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bioinformatics Pipeline Inoperative: Package Dependencies Updated? #13
Comments
Thanks for checking out our workflow. I'm sorry it's not working for either of you. I've only just begun to look into your issues, but it looks like at least two different problems. It looks like there's a dependency problem caused by continuing development on snakemake. I will try to get a more current set of dependencies posted this week. It looks like @mubashirhanif found a workaround with virtualenv and ran into a separate issue where their run failed to find any concatemers. @mubashirhanif, did you use the test data for this or your own data? |
@jmeppley I used the test data. But it seems like yet another dependency issue, relating to seqkit/minimap2. |
@ttschulze , I updated the environment definition files, but I don't seem to have write access to this repository anymore. I'm no longer actively collaborating with ONT, so I guess my privileges expired. So the pull request is not yet merged, in the meantime you can try cloning my version of the repo: https://github.com/jmeppley/DTR-phage-pipeline I had to fall back to an older version of snakemake, which is a little flaky. Sometimes, you have to run the workflow twice for it to get all the way through. I would like to update it to the latest snakemake, but that's going to take more time. @mubashirhanif, I can't reproduce your error. Can you try the updated conda environments and let me know if that changes anything? update: I was able to get it working on snakemake 8. It seems a bit more stable. (This is still only on my fork) |
Hi @jmeppley , I am working with @ttschulze on this. I tried to clone your version of the repo... This did get us further than we have, but it eventually returned this error: Were you able to successfully run? |
@dnev1551. OK, I that's progress, but I'm sorry it's not working. It looks like the conda installer for medaka is not pulling in the correct dependencies for you. Can you give me a little more information?
I'm particularly interested in the versions of numpy, h5py, and python, but it would be best to just cut and paste the whole list. |
@jmeppley, let me know if you need any other info. Thank you!
|
That's odd. The versions of h5py and numpy in that environment should be compatible. I just tested on my system. What version of conda are you using? What's the output of this:
Try replacing the contents of
then try running snakemake again. |
@jmeppley See below responses. That change to One issue that I could see in many instances when executing the code: Are there any potential solutions? There is an image of a few instances at the end of this post.
Before attempting any fixes, I installed Anaconda3-2024.02-1-Linux-x86_64.sh However, when I asked for
The output was:
This allowed the execution of the pipeline without erroring out! However, as mentioned above, it is skipping kaiju. Thank you for your time and help; we really appreciate it!! |
The workflow can run kaiju on your reads to estimate the composition of your sample. It's purely informational, but it can be useful. However, the kaiju database is too big to distribute with the code, so it's not part of the test. To configure it, you'll have to download a Kaiju database (there should be a link in the README) and update
There is something funky with your conda. The numpy module should have been imported from within the conda environment ( The updated medaka.yaml works because it's using a suite of package versions that happen to work fine with the numpy you have installed in your base environment. Normally, this wouldn't be necessary and the original package specifications would have worked. It's possible you have two competing conda installations configured in your shell. If you are using bash, check your ~/.bashrc and ~/.bash_profile files. There should only be one block of code setting up conda. (Although I use miniconda and I'm not as familiar with Anaconda). It might be worth deleting everything conda related in your bash config files and trying to install miniconda from scratch. You could also reach out to the conda team for help. |
@jmeppley Okay, I think I understand where that competing conda issue arose from. Thank you so much for your time and help! We really appreciate it. I am wondering if a possible error might arise as we used the newer base caller, Dorado, as opposed to the outdated Guppy. I will reach out if we run into trouble, but again, we REALLY appreciate all of your time and help!!! |
Hello @jmeppley , we ran an entire ONT seq run, consisting of just one specific novel phage on a Minion r9.4.1 flow cell, using the LSK-109 kit, which you had used in your publication (obviously this was overkill, but was our first time doing gDNA with the minion). We previously sequenced this novel phage using Illumina short-read seq & analyzed that data(which had convincing evidence of the presence of concatamer & DTRs; and a genome size of ~97 kb). If I am understanding correctly, this pipeline is mostly for metagenomic analysis (we had previously read your publication and all the detailed supplementary info/methods as well, which is why we chose to start with this pipeline for the ONT long-reads). Very informative and impressive publication! However, I thought it would still work even when running a single isolated phage. I was able to run the entire pipeline (including Kaiju), but am wondering:
Thank you! Sorry for all of the questions, just want to make sure I am using the pipeline correctly, so I can interpret the results accurately! |
Hi Andrew, Thanks for your interest in the pipeline. I would highly recommend just running the output of your sequencing through the Flye assembler and working with what comes out of that. I don't think our pipeline makes a ton of sense for working with a single phage and it will likely upset certain assumptions made in the workflow. If you're looking for the genome and annotation of genes, etc, a simple de novo assembly of the sequencing data should be sufficient. Once you have an assembly, our pipeline might be of help in just suggesting what steps you might want to run manually on your assembly (like Prodigal, etc). Hope this helps, |
@jbeaulaurier Okay, thank you for clarifying. I appreciate it! |
@jmeppley Thank you so much for your assistance with the technical troubleshooting; we were able to make it all the way through. I was hoping to ask a follow-up about the DTR sequences. In our case, the outputs are indeed supporting the presence of a DTR in the polished genomes- I am curious if there is a method in the pipeline to obtain the actual DTR sequence (to isolate it basically to examine the sequence). I am expecting a 566 bp DTR, and I see the outputs that bin the reads containing the DTR, but I can't find the DTR sequence by itself for mapping/analysis etc. Would very much appreciate if you had any thoughts here, if I'm simply missing it. Example output for the DTR alignment. It seems to support a fixed DTR well- we are curious if anything looks out of the ordinary to you: |
No, I'm sorry. If I recall correctly, we do not extract the DTR sequence. The pipeline just runs minimap2 and inspects the PAF output (which does not include sequences, only alignment locations). |
Hi all,
We are trying to get this pipeline working with the included control dataset so we can use it to characterize a variety of novel bacteriophages that we think may be circularly permuted with terminal repeats.
However, after cloning the pipeline from git and following the tutorial to test the included control dataset we are running into errors indicating the pipeline may be broken.
Our first error:
snakemake: error: unrecognized arguments: -r
Which seems to be caused by the conda package request for snakemake to the newest version where the "-r" flag is no longer used:
We did find that by editing the environment.yml file to use the originally intended version of snakemake.. changing from ">= 5.14.0" to "= 5.14.0" we were able to get past this error
However, we are currently stuck at the following error:
The log file shows the following:
We are assuming this is an issue with newer packages breaking the pipeline, but any thoughts on how to resolve these issues would be greatly appreciated. Please let me know if I can detail anything further.
The text was updated successfully, but these errors were encountered: