-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about SNPs and Library #72
Comments
John- |
Hi @John-Drake, Thanks @mhalushka. Q1) I agree with Marc's suggestion on testing the 3 nt additions at 5'. Q2) Yes. You can do that, here is an example append explained similar to the process you discussed. Now regarding the exact and isomiR, if you have reads with and without SNPs and you also have miR sequence indexed. Then you will be able to capture that as exact rather than as isomiR. I hope this is helpful. If you are conserned about the naming convention and want to retain the .SNP in the miRNA names, then you have to truncate the file "human_merges_miRBase.csv" Usage: Q3) I will get back to you on the exact version since the build is archived in another machine which I used back then. Thank you, |
If it is more convenient for you, you could simply upload the assembly to a dropbox (If you renamed the file, I understand it may be difficult to track down the exact version used). Thank you! |
Hi @John-Drake, I believe it is GCA_000001405.15 from GRCh38. It is what is annotated in miRBase gff3 Thank you, |
I am still getting mismatch alignments. But first, let me explain what I am doing.
Naturally, I am doing this in a script that loops through all entries in the miRBase_annotations.fasta file. For most miRNA (90%+), everything matches as one would expect (positive and negative strand). But often, I am finding inconsistencies on the 5' and 3'-end. For instance: `ERROR for hsa-let-7f-5p on the - strand ERROR for hsa-miR-16-5p on the + strand ERROR for hsa-miR-9-3p on the - strand ERROR for hsa-miR-329-5p on the + strand Note: hsa-miR-3714-5p is listed on chromosome KV766192.1, which according to NCBI is a later patch of GRCh38. I did try GCA_000001405.26_GRCh38.p11_genomic.fna. Same issue. Thus, several things are coming to mind.
I appreciate any feedback |
Hi @John-Drake, Thanks for bringing this to our attention. Please see my responses below:
I hope this helps. Once again, thank you for briniging this to our attention. Regards, |
Thank you for the information. I went ahead and wrote a script that check each miRNA against the assembly and miRBase. When significant errors were discovered (e.g. wrong chromosome, position, strand), I manually corrected those. The vast majority of errors I found were small. For instance, 232 miRNA had start positions off by 1 and 59 miRNA had their 5' and 3' ends extended with nucleotides that differ from the assembly. However, I did find around 2-dozen entries with incorrect position/chromosome/strand (which should be obvious in the code where I corrected those). Few caveats.
Hopefully you will find this helpful. And if you do believe my corrections are correct, please tell me as I am always concern that I may have made a mistake. Thank you |
Hi @John-Drake, Thank you, this is great. I will take a look at the script and let you know. Thank you, |
Hi @John-Drake, I went by your notebook, you are right about the genomic co-ordinates, they are not precise. Besides, you have mentioned few miRNAs in the miRge3 database and not in miRBase there is an explanation for that. I am in the middle of re-evaluation/revision of the database and will get back to you with my responses.
As mentioned, I will need somemore time to get back to you on this. Sorry for the delay in my response and thanks again for your comments/questions. In the meanwhile, find here attached the miRBase database with the addition of 3nt at 5' and 6nt at 3'. This is purely dependent on miRBase and does not consist of SNPs and other miRNAs. Thank you, |
Hello, I have several questions.
Q1: In your custom human library, miRNAs of SNPs are represented and merged together (after alignment and counting). How did you decide on which SNPs to represent? Looking at NCBI browser, it is easy for me to identify additional SNPs. And when I look at the SNP frequency, it isn't quite clear if your only merging SNPs that are common (e.g. MAF > 0.01 in some population) or using some other criteria. Could you elaborate on how you decided on what SNPs to represent in the mirge3 human library?
We are interested in detecting miRNAs with 3 nucleotides (NT) on the 5'-end. Probably the best way to do this would be to extend your custom library by 1 NT on the 5'-end (as you already extended the 5'-end by 2 NT). Thus,
Q2) Can I simply use bowtie-inspect to convert the custom library into a human-readable .fasta file, add the appropriate 5' NT to the custom library, and then convert it back with the default bowtie-build command?
Naturally, I can not depend on the "exact" and "isomiR" identifications provided in the output. I will need to write my own script to loop through the mapped.csv and capture and 3-NT isomiRs on the 5'-end (if they exist).
Q3) What assembly was used for mirge3. It's hg38 but I need to know the exact version (e.g. GCA_000001405.28 or GenBank version of GRCh38.p13). When I check how well the custom fasta miRBase entries match with different assemblies... I am not getting an exact match for many of the miRNAs.
Thank you!
-John
The text was updated successfully, but these errors were encountered: