-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fastalignment metric #456
Conversation
Added variant of AlignmentDistanceCalculator which performs faster, but may miss some of the pairwise distances.
for more information, see https://pre-commit.ci
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #456 +/- ##
==========================================
+ Coverage 80.22% 80.49% +0.27%
==========================================
Files 49 49
Lines 3939 3994 +55
==========================================
+ Hits 3160 3215 +55
Misses 779 779 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tobgan,
thank you so much for adding this!
The code you added looks good, but there are a few point that still need some work to make this more easily accessible to users. I'm happy to guide you, but I can also work on those tasks myself if you prefer.
- documentation: The new class should be added to the API docs
- tutorial: the new method should be mentioned in the docs and probably even be suggested as the standard method.
- The code needs unit tests (it's mostly applying the existing tests to the new method as well and adjusting the expected values)
Before we tackle those I have a more general question: Do I remember correctly, that if only the length filtering is applied there are no losses and the results are exactly the same as with the plain AlignmentDistanceCalculator
but faster? (obviously not as fast as the full method). Because in that case I would consider getting rid of the old AlignmentDistanceCalculator altogether and just use the new class with different parameters for alignment
and fastalignment
. What do you think?
Best,
Gregor
src/scirpy/ir_dist/metrics.py
Outdated
else penalty_dict[subst_mat] | ||
if subst_mat in penalty_dict.keys() | ||
else 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be simplified to
else penalty_dict[subst_mat] | |
if subst_mat in penalty_dict.keys() | |
else 0.0 | |
else penalty_dict.get(subst_mat, 0.0) |
I would even consider raising an error if the substitution matrix is unnown and no penalty is specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think raising an error would be better
Hi @tobgan -- would be great if I could get your feedback on the points raised above. |
Co-authored-by: Gregor Sturm <[email protected]>
Hi @grst, I'm very sorry for the belated response, I have been dealing with some health issues lately. Regarding the length filter - yes, the results are exactly the same, at least in practice. Theoretically, it could still result in some loss, e.g. with unusually high-scoring mismatches (as, for example, in PAM500, where the mismatch N-D scores higher than the match N-N), but this has not happened in any of my test runs and I would wager to say it is unlikely to happen with real data, and even less likely to do so to a relevant degree. So I think replacing the If you agree, I would then just make the Best, |
Thanks, that sounds good. Honestly, I doubt anyone has ever changed the substitution matrix to anything other than blosum62, so that should be fine. I would then suggest that in
Regarding the API docs, the list of functions is here: https://github.com/scverse/scirpy/blob/main/docs/api.rst?plain=1#L298 For the tutorial, it's probably easiest if I take care of that myself as a final step. |
Hi @grst, |
Perfect, thanks! I'll go over the documentation one more time myself and update the tutorial and then merge this! |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapped up the open things... Looks good to me now!
Some CI job is failing, but it doesn't look related to this PR.
Thanks again @tobgan!
Closes #304