Comparing very similar sequences does not provide all results. #7

KasperThystrup · 2024-10-14T10:25:44Z

First of thanks for a great tool!

While playing around with some comparisons between genes from the same file, I noticed that blastn behaves differently from expected:
blastn -query genes.fasta -subject genes.fasta -outfmt 6

Having two identical genes (cps2B & cps8B) in genes.fasta - results in following matches:
cps2B:cps8B (100% id and cov)
cps8B:cps2B (100% id and cov)

This comparisson misses the following:
cps2B:cps2B
cps8B:cps8B

Now adding a third gene (cps7B) to the mix by appending it to the genes.fasta, changes everything out entirely:
cps2B:cps2B
cps2B:cps7B (99.76% ID and 100% cov)
cps8B:cps8B
cps7B:cps7B
cps7B:cps2B (99.76% ID and 100% cov)

This comparisson misses the following:
cps2B:cps8B
cps8B:cps2B
cps8B:cps7B (should be 99.76% as well as cps2B and cps8B are identical)
cps7B:cps8B (should be 99.76% as well as cps2B and cps8B are identical)

Is there a way to include all top matches?

The text was updated successfully, but these errors were encountered:

JacobLondon · 2024-10-27T19:00:06Z

Glad you enjoy the tool! It's been a while since I've looked at my undergrad senior project. A quick disclaimer, I want to note that I have very little experience in bioinformatics other than this project, although our professor Mohamed El-Hadedy Aly, Ph.D. at California Polytechnic University, Pomona could give a very educated answer to possible questions.

That out of the way, I looked at the matching method in extend.cpp where the implementation attempts to find the best match via scoring/extending with the Smith-Waterman algorithm. It could be feasible to modify the extend_filter member function from 'track the best at all times' approach to keeping a std::List of ExtendedSequenceMap objects that is then sorted by score and returned, providing top matches.

Unfortunately, I haven't the time to maintain this project, but if you were so technically inclined, send me a pull request with an implementation and I might be able to provide that as an alternate approach!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing very similar sequences does not provide all results. #7

Comparing very similar sequences does not provide all results. #7

KasperThystrup commented Oct 14, 2024 •

edited

Loading

JacobLondon commented Oct 27, 2024 •

edited

Loading

Comparing very similar sequences does not provide all results. #7

Comparing very similar sequences does not provide all results. #7

Comments

KasperThystrup commented Oct 14, 2024 • edited Loading

JacobLondon commented Oct 27, 2024 • edited Loading

KasperThystrup commented Oct 14, 2024 •

edited

Loading

JacobLondon commented Oct 27, 2024 •

edited

Loading