Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analysis.atomicdistances.AtomicDistances does not use Results #4819

Open
orbeckst opened this issue Dec 3, 2024 · 8 comments
Open

analysis.atomicdistances.AtomicDistances does not use Results #4819

orbeckst opened this issue Dec 3, 2024 · 8 comments
Labels
API Component-Analysis decision needed requires input from developers before moving further defect parallelization
Milestone

Comments

@orbeckst
Copy link
Member

orbeckst commented Dec 3, 2024

Expected behavior

All analysis classes based on AnalysisBase store results in an instance of MDAnalysis.analysis.results.Results

Results are always stored in the attribute AnalysisBase.results, which is an instance of Results, a kind of dictionary that allows allows item access via attributes. Each analysis class decides what and how to store in Results and needs to document it. For time series, the AnalysisBase.times contains the time stamps of the analyzed frames.

The results should be accessible as, e.g., results.distances.

Actual behavior

analysis.atomicdistances.AtomicDistances.results directly contains a numpy array.

The lack of API conformity makes it impossible to directly parallelize the class, see PR #4808.

Code to reproduce the behavior

See docs: https://docs.mdanalysis.org/stable/documentation_pages/analysis/atomicdistances.html#MDAnalysis.analysis.atomicdistances.AtomicDistances.results

Current version of MDAnalysis

  • Which version are you using? 2.8.0
@orbeckst
Copy link
Member Author

orbeckst commented Dec 3, 2024

Note that the only option for fixing this issue is to break the existing API: we need the results attribute to now hold a Results instance instead of a numpy array. Thus, we are not able to make a backwards-compatible update. Either we fix it and break it or we add a deprecation warning and fix it for 3.0.

@orbeckst
Copy link
Member Author

orbeckst commented Dec 3, 2024

My view is that this was a mistake in the first place as it contradicts our specs and we should treat it as a bug and thus fix it.

This would also allow us to parallelize the class (see PR #4808).

P.S.: The AtomicDistances class was added in 2.5.0 and even our 2.5.0 docs state that results needs to be an instance of Results, see https://docs.mdanalysis.org/2.5.0/documentation_pages/analysis/base.html

@IAlibay
Copy link
Member

IAlibay commented Dec 4, 2024

I have two questions here:

  1. What was the intention of the coredev(s) who approved it - was this an oversight or an intentional design decision?
  2. How big a deal would it be if we marked it as non-parallelizable and just held back on the change until v3.0.

The reasons I'm asking that second question:

  1. It's pretty clear to me that we really should just do a v3.0 release in the next 3-4 months. Our current status is making dealing with upgrading Python versions (amongst other things) a bit of a pain, and it would just be easier to finish up v2.x than spend ages implementing workarounds.
  2. If we don't think a lot of folks are using that analysis method, then the impact of either not doing the change or doing it is very low. In which case, why not just wait a bit (only if my first point stands true).

@talagayev
Copy link
Member

I have two questions here:

  1. What was the intention of the coredev(s) who approved it - was this an oversight or an intentional design decision?
  2. How big a deal would it be if we marked it as non-parallelizable and just held back on the change until v3.0.

The reasons I'm asking that second question:

  1. It's pretty clear to me that we really should just do a v3.0 release in the next 3-4 months. Our current status is making dealing with upgrading Python versions (amongst other things) a bit of a pain, and it would just be easier to finish up v2.x than spend ages implementing workarounds.
  2. If we don't think a lot of folks are using that analysis method, then the impact of either not doing the change or doing it is very low. In which case, why not just wait a bit (only if my first point stands true).

I could try to see if it is possible to adjust it to get Results if that's fine, especially if there are a couple of months time to adjust it, but can't promise if it works.

As for the parallelization and v3.0, should it be the case that for the upgrade to v3.0 all of the analysis functions should have either implemented parallelization or marked as non parallelizable or is it fine if it is with the current state, where half of the functions have an implementation parallelization/non parallelization marks.

@orbeckst
Copy link
Member Author

orbeckst commented Dec 4, 2024

I could try to see if it is possible to adjust it to get Results if that's fine, especially if there are a couple of months time to adjust it, but can't promise if it works.

A PR that fixes this issue would be very welcome, no matter what — this will have to be fixed for 3.0. The question is just if there will be a 2.9.0 or if we will go directly to 3.0.

As for the parallelization and v3.0, should it be the case that for the upgrade to v3.0 all of the analysis functions should have either implemented parallelization or marked as non parallelizable or is it fine if it is with the current state, where half of the functions have an implementation parallelization/non parallelization marks.

For 3.0, all analysis classes should have a defined behavior for parallelization.

@talagayev
Copy link
Member

I could try to see if it is possible to adjust it to get Results if that's fine, especially if there are a couple of months time to adjust it, but can't promise if it works.

A PR that fixes this issue would be very welcome, no matter what — this will have to be fixed for 3.0. The question is just if there will be a 2.9.0 or if we will go directly to 3.0.

As for the parallelization and v3.0, should it be the case that for the upgrade to v3.0 all of the analysis functions should have either implemented parallelization or marked as non parallelizable or is it fine if it is with the current state, where half of the functions have an implementation parallelization/non parallelization marks.

For 3.0, all analysis classes should have a defined behavior for parallelization.

I see, then probably #4817 and #4818 would need to be adressed also first, since those two are lacking AnalysisBase. I raised an
Issue for those and would look into them in the free time to see if I can implement the AnalysisBase there

@orbeckst
Copy link
Member Author

orbeckst commented Dec 4, 2024

For 3.0, all analysis classes should have a defined behavior for parallelization.

My opinion is that anything that uses AnalysisBase should have well-defined behavior. Legacy code may still do whatever it used to do.

Ideally, all analysis tools that broadly do per-frame analysis use AnalysisBase but I don't think that we promised that for 3.0... but it would be great ;-).

@talagayev
Copy link
Member

For 3.0, all analysis classes should have a defined behavior for parallelization.

My opinion is that anything that uses AnalysisBase should have well-defined behavior. Legacy code may still do whatever it used to do.

Ideally, all analysis tools that broadly do per-frame analysis use AnalysisBase but I don't think that we promised that for 3.0... but it would be great ;-).

Yup, makes sense :) I think it should be quite doable especially for MDAnalysis.analysis.hydrogenbonds.hbond_autocorrel, who has some structural similarities of the code to other AnalysisBase analysis tools, so that should work, the other one MDAnalysis.analysis.distances can't say, but from the first look also doable :)

@orbeckst orbeckst added the decision needed requires input from developers before moving further label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Component-Analysis decision needed requires input from developers before moving further defect parallelization
Projects
None yet
Development

No branches or pull requests

3 participants