You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While playing around with some comparisons between genes from the same file, I noticed that blastn behaves differently from expected: blastn -query genes.fasta -subject genes.fasta -outfmt 6
Having two identical genes (cps2B & cps8B) in genes.fasta - results in following matches:
cps2B:cps8B (100% id and cov)
cps8B:cps2B (100% id and cov)
This comparisson misses the following:
cps2B:cps2B
cps8B:cps8B
Now adding a third gene (cps7B) to the mix by appending it to the genes.fasta, changes everything out entirely:
cps2B:cps2B
cps2B:cps7B (99.76% ID and 100% cov)
cps8B:cps8B
cps7B:cps7B
cps7B:cps2B (99.76% ID and 100% cov)
This comparisson misses the following:
cps2B:cps8B
cps8B:cps2B
cps8B:cps7B (should be 99.76% as well as cps2B and cps8B are identical)
cps7B:cps8B (should be 99.76% as well as cps2B and cps8B are identical)
Is there a way to include all top matches?
The text was updated successfully, but these errors were encountered:
Glad you enjoy the tool! It's been a while since I've looked at my undergrad senior project. A quick disclaimer, I want to note that I have very little experience in bioinformatics other than this project, although our professor Mohamed El-Hadedy Aly, Ph.D. at California Polytechnic University, Pomona could give a very educated answer to possible questions.
That out of the way, I looked at the matching method in extend.cpp where the implementation attempts to find the best match via scoring/extending with the Smith-Waterman algorithm. It could be feasible to modify the extend_filter member function from 'track the best at all times' approach to keeping a std::List of ExtendedSequenceMap objects that is then sorted by score and returned, providing top matches.
Unfortunately, I haven't the time to maintain this project, but if you were so technically inclined, send me a pull request with an implementation and I might be able to provide that as an alternate approach!
First of thanks for a great tool!
While playing around with some comparisons between genes from the same file, I noticed that blastn behaves differently from expected:
blastn -query genes.fasta -subject genes.fasta -outfmt 6
Having two identical genes (cps2B & cps8B) in genes.fasta - results in following matches:
cps2B:cps8B (100% id and cov)
cps8B:cps2B (100% id and cov)
This comparisson misses the following:
cps2B:cps2B
cps8B:cps8B
Now adding a third gene (cps7B) to the mix by appending it to the genes.fasta, changes everything out entirely:
cps2B:cps2B
cps2B:cps7B (99.76% ID and 100% cov)
cps8B:cps8B
cps7B:cps7B
cps7B:cps2B (99.76% ID and 100% cov)
This comparisson misses the following:
cps2B:cps8B
cps8B:cps2B
cps8B:cps7B (should be 99.76% as well as cps2B and cps8B are identical)
cps7B:cps8B (should be 99.76% as well as cps2B and cps8B are identical)
Is there a way to include all top matches?
The text was updated successfully, but these errors were encountered: