forked from NCIP/nci-protege5
-
Notifications
You must be signed in to change notification settings - Fork 2
Batch load and edit formats
NCIEVS edited this page Jan 27, 2021
·
2 revisions
The format of the files:
- Tab-delimited (tsv), and the strings are not quoted. Any quotes will be retained, which you probably don’t want unless the quotes are found within the strings themselves.
- Files are UTF-8 encoded (a basic ascii text file qualifies as utf-8). Any lines with non-UTF8 characters will be discarded.
- Byte-order-marks (BOM) in Windows unicode/utf-8 files are not being dealt with and break the batch jobs. In Windows, use an editor like Notepad++ to save files as UTF8 w/o a BOM.
The “id” entries in the descriptions below are the fragment identifiers of the IRIs, i.e. whatever follows the last /
or #
in the IRI.
term | parent-id |
---|---|
Lab Result | C1234 |
Pizza Order | C456 |
Holidays | C4321 |
Lápiz óptico | C654 |
- The term cannot contain embedded control characters, or the ‘?’ or ‘!’ characters (see Special Characters in here).
- The term will be used to create the rdfs:label, the preferred_name, and the fully qualified synonym (with NCI source and PT term type) of the newly-created class.
- The new class will be assigned a new code/id property, and this code/id will be used for the fragment identifier of the IRI.
- The parent-id identifies the superclass under which the new class will be treed. This superclass must already exist, i.e. we can’t create a new class in one row and then use it as a superclass in subsequent row, its identifier must be known prior to the run.
- We don’t support the case where the fragment is generated by taking the term and modifying it, e.g “hello there” -> “hello_there” so that the fragment can be predicted and used elsewhere in the input file as a parent-id.
Format
entity-id | operation | prop-id | value | value |
---|---|---|---|---|
C123 | new | P34 | Now is the time | |
C456 | modify | P34 | Now is the time | For all good women |
C567 | delete | P52 | foo |
Example “by code” (i.e. meaningless fragment identifiers in IRIs)
C123 | new | P34 | Now is the time | |
C456 | modify | P34 | Now is the time | For all good women |
C567 | delete | P52 | foo |
Example “by name” (i.e. meaningful fragment identifiers in IRIs)
Gene | new | Editor_Note | Now is the time | |
Gene | modify | Editor_Note | Now is the time | For all good women |
Gene | delete | Editor_Note | Now is the time |
entity-id | operation | prop-id | value | ann-id | value | ann-id | value | ann-id | value | prop-id | value | ann-id | value | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
C123 | new | P56 | foo bar | P101 | NCI | P102 | PT | P103 | FDA | ||||||||
C456 | modify | P56 | foo bar | P101 | NCI | P102 | PT | P103 | FDA | P56 | foo baz | P101 | NCI | P102 | PT | P103 | FDA |
C789 | delete | P56 | foo baz | P101 | NCI | P102 | PT | P103 | FDA |
- Note that for the modify operation, one can tell where the old value ends and new begins as the property id occurs again, .eg.
P56
above. The “qualifier” annotations are included to disambiguate properties that might only differ in their qualifiers, i.e. the property values are the same. - We call “complex” the properties that have values which are annotated with other properties.
entity-id | operation | parent-id | replacement-id |
---|---|---|---|
C123 | new | C789 | |
C456 | modify | C998 | C889 |
C789 | delete | C996 |
entity-id | operation | id | mod | fill-id | mod | fill-id | ||
---|---|---|---|---|---|---|---|---|
C123 | new | P56 | some | C876 | ||||
C456 | modify | P56 | all | C889 | some | C889 | ||
C567 | delete | P67 | some | C776 |