Replies: 9 comments 11 replies
-
(Inspired by discussions with developers across different Monte Carlo software, and more specific chat with the EGSnrc developers.) |
Beta Was this translation helpful? Give feedback.
-
Very well fleshed out, it all makes sense to me. Just a couple typos:
I think you meant 03010.
should be: 21150: time of flight in ms, float, 4 bytes |
Beta Was this translation helpful? Give feedback.
-
I think it would make sense for IAEA to include a few basic phase-space parsing tools intended for users (not developers), such as the |
Beta Was this translation helpful? Give feedback.
-
@ftessier this is a very well thought out approach! To me it looks like the way to the future! For psf files at least! There is only one concern in this: Will then the IAEA maintain a database of all possible "official" formats? These files could get very large depending on the additional parameters added to a record. |
Beta Was this translation helpful? Give feedback.
-
Note that a general psf format could also come to serve RAM particle storage during a simulation. For instance, I can see |
Beta Was this translation helpful? Give feedback.
-
I like that this maintains and improves the flexibility of the current IAEA format. E.g. you can include any number of variables in the psf and have any number constant. The naming of the "extra" variables is now clearer because all variables are defined the same way. Using numbers for IDs makes them easily parsed, but you could include optional comments next to each ID in the header to describe what parameters are included in plain text (this could be automatically done in the IAEA reader/writer). |
Beta Was this translation helpful? Give feedback.
-
Note: this discussion will be continued with the IAEA and other Monte Carlo developers on the side for a while. Please don't hesitate to continue commenting here with suggestions, requirements, concerns about the future of the IAEA phase space file format. I will relay your ideas to the discussion group, until a first proposal of the format is opened publicly for discussion. |
Beta Was this translation helpful? Give feedback.
-
This sounds very good improvement! Do you have any thoughts on preserving the backward compatibility to existing IAEA phsp files / file format? |
Beta Was this translation helpful? Give feedback.
-
Hi, @ftessier - I realize this discussion is a little old now, but I just wanted to |
Beta Was this translation helpful? Give feedback.
-
Proposal for an extensible IAEA phase space format
(This is an evolving document)
Issue
We recognize that there is a tension between two objectives in defining a formal
specification for IAEA phase space files (psf):
One way to reconcile these objectives is to allow optional
extra
fields ineach record. The original IAEA psf format allows an array of extra floats and extra
integers to be added to each record, as specified in the header.
It is now realized that this approach has led users to define a multitude of
extra fields, which are not generally bound to any format, and often only
relevant in the software which produced the psf. For end users, it becomes
difficult, or at least confusing, to determine if a given psf can be handled by
a given software. In the worst case scenario, the extra fields are not
interpreted correctly but tacitly used in a simulation with a different
meaning.
Requirement
An extensible IAEA psf format that is nevertheless constrained by a formal
specification.
Proposed solution
We propose to generalize the existing extra fields and "register" them with
IAEA, meaning that they are integrated in the the official format
specification. This incurs additional management to incorporate new fields as
developers request them. But an immediate benefit is that IAEA psfs then remain
bound to a definite format, especially in time and across different software.
Ideally, the proposed solution would accommodate legacy IAEA psfs generated
before the implementation of this idea.
General idea
Associate one numeric ID to each registered field. These don't have to follow
any pre-defined structure (they could be assigned sequentially), but it will
prove useful to classify them loosely according to their meaning (as in library
book codes for subjects, remember those?). We want to keep the IDs numerical to
avoid all the idiosyncrasies of string manipulation and parsing.
Details
Here is a sample header section specifying the content of each record in a
current IAEA psf:
We propose to continue using
$RECORD_CONTENTS
, but turn it into a list ofidentifiers that specify the record fields, more generally. The number
in the record contents section are not flags any more, but IDs that refer to
a formal specification (variable type, byte size, units and meaning):
(edit: we have now moved on to use fixed token strings instead of numerical ID numbers).
By starting valid IDs say at 100, the IAEA library can recognize legacy files (ID <
100) and then issue a warning and branch off to read the file using the legacy
interpreter. Better yet, the IAEA code would translate the legacy file to the
psf IDs and use the new interpreter.
The reason for spacing out the IDs is to leave room for future definitions for
the same physical quantity. For example, the ID
01101
could refer to thex
value in 64-bit precision (double), and eventually ID
01102
might refer to thex
value in 128-bit precision, when technology gets there, and so on. Anotherexample is for different units, for example
03000
might refer to energy inMeV
, while03010
could be reserved foreV
units, etc.We could require that the IDs be sorted in ascending order in the header, to
simplify the reader algorithm. It would also tend to enforce a more consistent
format across psfs from various sources. But reordering can also be handled by
the reader or even the simulation software, given the list of IDs from the
record contents.
Classification
(edit: we have now moved on to use fixed token strings instead of numerical ID numbers).
As stated above, the IDs can be assigned willy-nilly without constraint, but
for mnemonic purposes it shall proves useful to create a loose hierarchy, for
example:
To set things concretely, here is what a partial expansion of the hierarchy
might look like in practice:
In case these are not sufficient, eventually, additional or higher number ranges can be
assigned. The idea is not strict conformance, but a rough organization of the
ID numbers. In fact, this hierarchy should never be mentioned in the formal
specification intended for the user, so that the only way to know what an ID means
is to look it up in the specs. There should be nothing implicit in the
interpretation of the IDs.
Here are some possible ID examples that spring to mind:
Pre-defined groups
For convenience, this system allows the definition of predefined groups of
variables, e.g.,
Constants
We can also leverage this system to define multiple record constants in the
header, for example:
Implementation
The implementation is left as an exercice for the reader! :-) Just kidding.
Inside the IAEA library, it implies reworking the parsing of the record
contents (and constants) sections in the header, ensuring that the legacy
format can still be processed. The new implementation would rely on an
include
file which specifies the type and size associated with each ID. Thereare various ways to generate a format description of the IDs from the
include
file automatically (
doxygen
comes to mind).On the software side, the developers of each software toolkit can decide which
IDs they implement (beyond the fundamental ones). They would rely on the same
include
file definitions provided with the IAEA library, and their own patchcode to transfer the data from the IAEA psf records to the software's native
data structures.
The IDs themselves are 4-byte integers.
Strategy
The best starting point is certainly not to form a committee to discuss at
length what should be included, nor the finer points of the classification.
Instead, let's start with what we have now, ensuring legacy support. Then let
everyone propose some IDs they can use right now, and draft a first
classification based on that, leaving a decent amount of ID space for future
definitions.
For example, I am not suggesting to implement all possible distance units
(km, m, cm, mm, um, nm). Rather, let's start with, say the common 'cm', but
leave space for other units in the future if there is a good case for it.
Caveats
If a simulation software does not know how to process some ID xxxxx, then it
should alert the user, and then proceed to ignore the corresponding field in
each record. This is possible as long as the IAEA header file containing all
the field IDs is up to date. Otherwise, then the simulation ought to quit. As a
last resort, the IAEA library could also provide a
trim
tool for users toremove some fields, to allow portability in a case where they are using a
software that relies on an outdated ID header.
Care should be taken to attempt, as much as possible, to create common
unique IDs for the same physical quantity, recognized by all software. Of
course, it would be possible (and at times tempting!) to define a whole suite
of software specific items, say an EGSnrc version of position data x, y, z (!).
But that makes no sense and it entirely defeats the purpose of an interchange
format. The registered IDs ought to be software agnostic. An example of an
EGSnrc-specific ID would be for
latch
(with no foreseen equivalent in othersoftware), but perhaps zlast should be declared more generally as "the
z-coordinate" of the last interaction, and be defined in the "History
information" segment (30000s in the sample above).
Beta Was this translation helpful? Give feedback.
All reactions