Skip to content

Batch load and edit formats

NCIEVS edited this page Jan 27, 2021 · 2 revisions

The format of the files:

  1. Tab-delimited (tsv), and the strings are not quoted. Any quotes will be retained, which you probably don’t want unless the quotes are found within the strings themselves.
  2. Files are UTF-8 encoded (a basic ascii text file qualifies as utf-8). Any lines with non-UTF8 characters will be discarded.
  3. Byte-order-marks (BOM) in Windows unicode/utf-8 files are not being dealt with and break the batch jobs. In Windows, use an editor like Notepad++ to save files as UTF8 w/o a BOM.

The “id” entries in the descriptions below are the fragment identifiers of the IRIs, i.e. whatever follows the last / or # in the IRI.

Class load

term parent-id
Lab Result C1234
Pizza Order C456
Holidays C4321
Lápiz óptico C654
  1. The term cannot contain embedded control characters, or the ‘?’ or ‘!’ characters (see Special Characters in here).
  2. The term will be used to create the rdfs:label, the preferred_name, and the fully qualified synonym (with NCI source and PT term type) of the newly-created class.
  3. The new class will be assigned a new code/id property, and this code/id will be used for the fragment identifier of the IRI.
  4. The parent-id identifies the superclass under which the new class will be treed. This superclass must already exist, i.e. we can’t create a new class in one row and then use it as a superclass in subsequent row, its identifier must be known prior to the run.
    • We don’t support the case where the fragment is generated by taking the term and modifying it, e.g “hello there” -> “hello_there” so that the fragment can be predicted and used elsewhere in the input file as a parent-id.

Simple Annotation Property

Format

entity-id operation prop-id value value
C123 new P34 Now is the time
C456 modify P34 Now is the time For all good women
C567 delete P52 foo

Example “by code” (i.e. meaningless fragment identifiers in IRIs)

C123 new P34 Now is the time
C456 modify P34 Now is the time For all good women
C567 delete P52 foo

Example “by name” (i.e. meaningful fragment identifiers in IRIs)

Gene new Editor_Note Now is the time
Gene modify Editor_Note Now is the time For all good women
Gene delete Editor_Note Now is the time

Complex Annotation Property

entity-id operation prop-id value ann-id value ann-id value ann-id value prop-id value ann-id value
C123 new P56 foo bar P101 NCI P102 PT P103 FDA
C456 modify P56 foo bar P101 NCI P102 PT P103 FDA P56 foo baz P101 NCI P102 PT P103 FDA
C789 delete P56 foo baz P101 NCI P102 PT P103 FDA
  1. Note that for the modify operation, one can tell where the old value ends and new begins as the property id occurs again, .eg. P56 above. The “qualifier” annotations are included to disambiguate properties that might only differ in their qualifiers, i.e. the property values are the same.
  2. We call “complex” the properties that have values which are annotated with other properties.

Parents

entity-id operation parent-id replacement-id
C123 new C789
C456 modify C998 C889
C789 delete C996

Roles

entity-id operation id mod fill-id mod fill-id
C123 new P56 some C876
C456 modify P56 all C889 some C889
C567 delete P67 some C776