Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to correctly use NXprocess #181

Open
lukaspie opened this issue Feb 19, 2024 · 2 comments
Open

How to correctly use NXprocess #181

lukaspie opened this issue Feb 19, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation needs-discussion question Further information is requested

Comments

@lukaspie
Copy link
Collaborator

lukaspie commented Feb 19, 2024

Problem

In the NXprocess base class, the docstring says:
"Document an event of data processing, reconstruction, or analysis for this data."

This suggests that one NXprocess should describe one event of data processing. However, NXprocess can at the moment contain multiple of NXregistration, NXdistortion, and NXcalibration, suggesting that it is possible to have multiple "events" in one NXprocess instance. This is somewhat inconsistent and it makes the other fields in NXprocess, which are related to the order of processing (like sequence_index) hard to consistenly use.

My suggestion

In #177, we have introduced the base class NXhistory for the description of the history of a physical entity. NXhistory can hold many of NXactivity as well as NXphysical_process and NXchemical_process. I propose to extend NXhistory such that it can also describe the history of processing events:

NXhistory base class:

(NXhistory):
  (NXactivity):
  (NXphysical_process):
  (NXchemical_process):
  (NXprocess):
  (NXregistration):
  (NXdistortion):
  (NXcalibration):

Then, on the app-def level, we can write:

(NXentry):
  processing_history(NXhistory):
    (NXprocess): # with base class inheritance, could be any of NXcalibration, NXregistration, NXdistortion, NX
    (NXregistration):
    (NXdistortion):
    (NXcalibration):

Additional ideas

  1. Eventually, the idea would be that every of these base classes (incl. NXprocess) extends NXactivity (via base class inheritance) and gets a timestamp as well as a sequence index to fully describe the chain of events that occurred.
  2. There exist the data idea that NXhistory is a graph with nodes NXactivity (and similar). We could make the edges in the graph more pronounced by using/modifying the existing NXgraph_* base classes.
  3. We were discussing about how to describe the sequence of measurement events in the MPES framework (see Experimental recipe (order and link different NXentries) #173). Maybe we could describe these measurement events as sets of NXactivity instances in the future.

What do you think @FAIRmat-NFDI/areab?

@tomio13
Copy link
Collaborator

tomio13 commented Feb 19, 2024

I would start with what is an 'event of data processing'.
If we talk about a set of steps producing a single new data output, then it makes sense to allow multiple objects which together build this step up.

@lukaspie lukaspie added documentation Improvements or additions to documentation question Further information is requested needs-discussion labels Feb 20, 2024
@mkuehbach
Copy link
Collaborator

mkuehbach commented Feb 26, 2024

#140 with NXapm is a typical example how NXprocess has been thought of by NIAC - that is e.g. the processing of raw detector hits into calibrated time of flight is decomposed into a sequence of NXprocess instances. I am happy with this. However,

NXprocess has been designed with a single sequence_id only in the past. Means that implies the processing is a sequence so a much simpler graph than typically used. E.g. if you have a Y-junction where results of two NXprocesses are necessary input to another NXprocess which sequence_id should the inputs (NXprocesses) have? The idea of using NXhistory is essentially stating we wish to describe also such junctions as what they are: A graph with NXprocesses as nodes and directed edges connecting these. This is the essence I support. As most workflows can be modelled as triplets of some input (at least one) is fed to some functor (action/process with some (set) of algorithms happening in this box) and generates -> some (at least) one output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation needs-discussion question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants