Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'bins' Error while running miRge3.0 #100

Open
ChenSchiff opened this issue Jul 20, 2024 · 4 comments
Open

'bins' Error while running miRge3.0 #100

ChenSchiff opened this issue Jul 20, 2024 · 4 comments

Comments

@ChenSchiff
Copy link

Hello,

I ran the following command:
miRge3.0 --adapter AACTGTAGGCACCATCAAT --mir-DB miRBase --samples july_fastq_list.txt --libraries-path mirge3_lib/ --organism-name human --tRNA-frag --qiagenumi -umi 0,12 --crThreshold 0.01 --threads 85 --AtoI --outDir output_qiagen_umi_2 -udd

I have 150 samples with umi (qiagen kit).
When trying to run this command on 10 samples it worked, but with 150 samples I got the following error:

...
Matrix creation finished in 206.1682 second(s)

Traceback (most recent call last):
File "/usr/local/bin/miRge3.0", line 10, in
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/mirge/main.py", line 140, in main
pdDataFrame,sampleReadCounts,trimmedReadCounts,trimmedReadCountsUnique = baking(args, fastq_fullPath, base_names, workDir)
File "/usr/local/lib/python3.10/site-packages/mirge/libs/digest.py", line 282, in baking
hist, bins = np.histogram(val, bins=(maxVal-minVal))
File "<array_function internals>", line 200, in histogram
File "/usr/local/lib/python3.10/site-packages/numpy/lib/histograms.py", line 780, in histogram
bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
File "/usr/local/lib/python3.10/site-packages/numpy/lib/histograms.py", line 424, in _get_bin_edges
raise ValueError('bins must be positive, when an integer')
ValueError: bins must be positive, when an integer

Any idea of how to solve that?
Thank you

@arunhpatil
Copy link
Collaborator

Hi @ChenSchiff,

Looks like one or more samples doesn't comply with adapter/Qiagen UMI pattern, this would occur if the reads are not trimmed correctly and the value for plotting histogram with parameters bins is negative leading to this error.

Can you try the following things:

  1. Check the adapters of all samples and re-run miRge3.0. # More likely the cause of the error
  2. We can check by replacing the values of bin to be anything but negative. So, if you can edit the code to place an if block as shown below:
    /usr/local/lib/python3.10/site-packages/mirge/libs/digest.py
if (maxVal-minVal) >0:
    hist, bins = np.histogram(val, bins=(maxVal-minVal))
else:
    hist, bins = np.histogram(val, bins=10) # Provide any other number and check if the error is resolved at this step, NOTE the end histogram figure (in the html file) may not be generated. 
  1. Instead, you may run 10 samples at once (in batches) and later combine the counts and RPM matrix of all samples for your analysis (i.e., combine all 150 samples from 15 miRge output folders). (More desirable method), I have few codes to combine and create counts/RPM matrix here.

Let me know if this works. I strongly recommend the last method, and you can also find if any of the sample file is corrupt during your batch runs.

Thank you,
Arun.

@arunhpatil
Copy link
Collaborator

Hi @ChenSchiff,

I thought of something we changed in miRge, and the new release is not public yet. Before you try the previous suggestions, can you replace the digest.py code from the source. Download the zip file, and after unpacking the zip file, copy cp /miRge3.0-master/mirge/libs/digest.py /usr/local/lib/python3.10/site-packages/mirge/libs/. This will replace the digest.py file with updates on histogram plots. If you want to backup the existing digest.py file first, then cp /usr/local/lib/python3.10/site-packages/mirge/libs/digest.py /usr/local/lib/python3.10/site-packages/mirge/libs/digest_backup.py.

Then you can run all 150 samples at once and let me know if that works. If not try the previous suggestion (running in batches).

Thank you,
Arun.

@ChenSchiff
Copy link
Author

Hi @arunhpatil,

Thank you for the options you suggested!

I'm using mirge3 with Singularity, which prevents me from directly accessing the code.
However, I found the problem while running the samples in batches. One of the samples was empty, causing the pipeline to fail

I have another question regarding the output of the pipeline. After the pipeline completes, one of the output files is the tRF matrix counts. Is there a way to prevent the automatic conversion of tRNA names from their full gene names to the "comb" names? This conversion is based on the human tRF merges file included in the annotation libraries.

Thank you for your help,
Chen

@arunhpatil
Copy link
Collaborator

Hi @ChenSchiff,

Great.

Regarding the tRNA, the quick answer is to rename the merges file and create an empty merges in the library folder.

# This will keep backup of your merges file:
mv miRge_lib/human/annotation.Libs/human_tRF_merges.csv miRge_lib/human/annotation.Libs/human_tRF_merges_backup.csv

# This will create an empty file for miRge3.0 
touch miRge_lib/human/annotation.Libs/human_tRF_merges.csv

This will trick the miRge and provide you the full names. Give it a go and let me know how if this is helpful.

Thank you,
Arun.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants