Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating out objects from Engens for better resusabilty #1

Open
FelixMQuintana opened this issue Sep 2, 2022 · 3 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@FelixMQuintana
Copy link

My branch provides examples of how to expand out your existing Engens class into more independent objects for better reusability. Work is not complete, just something to work off of.

@FelixMQuintana FelixMQuintana added the enhancement New feature or request label Sep 2, 2022
@AnjaConev
Copy link
Contributor

Really nice, this will be a great enhancement.

I can took a look at you commits and I can also start adding stuff and implementations.

Here is the UML diagram I made:
image

And a link for you to check it out and edit if you'd like:
https://lucid.app/lucidchart/aae34ca7-6dc9-4fb6-988b-0524ddf26756/edit?viewport_loc=-171%2C-67%2C889%2C1267%2C0_0&invitationId=inv_e19dfa8b-6bb7-4ce9-9268-dc31fab7783e#

I will think about what are some of the steps I can take on and I will post in next comment

@AnjaConev
Copy link
Contributor

Ok for now we are cleaning up the file handling.
This will be useful as all the file handling stuff right now is spread out through EnGens class and it is very messy.

Two key things that are a part of the file handling:

  1. Loading
  2. Aligning

Loading means taking the path and instantiating PyEmma trajectory (with pyemma.coordinates.source) or the PDB files (with mdtraj.load).
This stuff for PDBs:
https://github.com/KavrakiLab/EnGens-private/blob/9d418ffa569294163e005b97f9862816d33780e0/engens_code/engens/core/EnGens.py#L78-L87
This stuff for trajectories is a bit spread out through the code and looks like this:
https://github.com/KavrakiLab/EnGens-private/blob/9d418ffa569294163e005b97f9862816d33780e0/engens_code/engens/core/EnGens.py#L215

Things to think about:

  1. Should we add an abstract method load() to SimulationFile? This method can perform the loading from paths to the actual structures?
  2. Another tricky thing: there is an option of "residue selection" while loading a file. This means - user gives a list of atoms as a substructure that he wants to load from the input path. In this case - when we load the files we load the selection and not the full structure. Maybe we can make an abstract FileLoader and then have a different function when loading a selection vs loading the full file?
  3. When loading trajectories we actually need to give two paths: one to the trajectory file (".xtc" extensions and such) and one to the topology file (".pdb" extension) - we might have to modify TrajectoryFile to have another path (or to contain PDBFile with the path to the topology)
  4. Trajectories can be very long and during loading we need to take care of the memory. So we have to make sure that we only load them with pyemma.coordinates.source (this is a memory safe call unlike pyemma.coordinates.load).

Aligning means aligning the structures within the trajectory or aligning the structures within the list of PDB files.

These functions:
https://github.com/KavrakiLab/EnGens-private/blob/9d418ffa569294163e005b97f9862816d33780e0/engens_code/engens/core/EnGens.py#L149
https://github.com/KavrakiLab/EnGens-private/blob/9d418ffa569294163e005b97f9862816d33780e0/engens_code/engens/core/EnGens.py#L263

Things to think about:

  1. In our use-cases pdb files are usually dealt with as a list of PDBFiles (e.g. alignment can actually be performed only on a list of pdb files and not for one file). Maybe we can have another object PDBFilesList that is Alignable instead of PDBFile being Alignable.

@AnjaConev
Copy link
Contributor

I think action items can be as follows:

  1. Figure out how to implement the file loading
  • Pass both topology and trajectory file in order to load TrajectoryFile
  • Load regular files
  • Load files with given selection

always keep in mind the possible memory problems

  1. Figure out how to implement alignment
  • Alignment of trajectories (should be straightforward copy codes etc.)
  • Alignment of pdb files (implement a list of PDBFiles that is Alignable)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants