Javier Quinteros
November 2017
The usage of persistent identifiers within the seismological community is a topic which has been discussed during the last years in different scenarios or forums. For instance, in order to solve the clear and growing need to ensure that seismic networks are reliably identified in citations by researchers or monitoring networks that make use of their data. But the usage of DOIs is not the only possibility within the Persistent Identifiers (PID) landscape, as some data centres, such as GEOFON, have been using ePIC PIDs to internally identify individual data files from our data holdings. When a more detailed identification is needed (far from the network, closer to the hardware/instrument), the use of ePIC PIDs or something not so strict as DOIs seems much more feasible. However, in some cases is not clear if the identifier is related to the instrument itself or to the datasets generated by the instrument.
GEOFON has at least two use cases related to persistent identifiers for instruments. The first one is a community effort to adopt DOIs to identify networks, while the second one is the internal ongoing development of a system to manage the instrument pool (seismic station components), provide information about the deployed stations, and offer a catalog of responses for different instruments.
The International Federation of Digital Seismograph Networks [1] (FDSN) is a global organization. Its membership is comprised of groups responsible for the installation and maintenance of seismographs either within their geographic borders or globally. Seismological data centres (and the seismological community in general) follow the recommendations of the FDSN, as a framework which provides a global coordination, and facilitates the interoperation, of services and activities.
In 2014, the collaboration between data centres from Europe and USA regarding the best way to tackle the problem of citation and proper attribution of seismological datasets resulted in the “FDSN recommendation for seismic network DOIs and related FDSN services” (Clark et al., 2014), also summarised in Evans et al. (2015).
According to this recommendation, each seismic network must be identified by a DOI. However, there is still some freedom on how to match the different attributes from the DataCite schema with the information describing the network, that must be present in the DOI metadata. We provide here some examples of DOIs minted in different data centres around Europe: 10.15778/RESIF.CL, 10.14470/ab466166, 10.12686/sed/networks/ch
From the recommendations it can be seen that there is a fuzzy line which separates the hardware, the metadata describing it, and the data.
“In this view a seismic network is an entire collection of sensor data, but also the seismic metadata associated with it, such as station details, instrument types, response data.”
FDSN Recommendations for DOIs
In the document, fields/attributes from the DataCite schema associated with the DOI are classified as mandatory, recommended and optional. Below, mandatory fields are shown in bold and recommended in italics.
- Creator: The principal investigator(s), or the organization(s) operating the network. For a permanent network, this is likely to be an organization, e.g. "USGS", "GFZ".
- Title: This should be about 5 to 10 words naming the seismic network, similar to the descriptions found in StationXML, a standard format used by the community to describe the hardware and the response of the sensors.
- Publisher: The institution (or data centre) responsible for making the data, i.e. DOI and landing page, permanently available.
- Publication year: DataCite recommends using the "year when the data was or will be publicly made available". However, due to changes in access policy and embargo, it should be the first year of data collection by the network rather than the first year of unrestricted access.
- Resource type: Primary type should be “Other” and subtype should be "Seismic network". The full type would thus be “Other/Seismic network”.
- Description: A short summary of the network, no more than 200-300 words in length. The DataCite descriptionType should be “Abstract”.
- Format: The text “SEED data” is suggested when that is appropriate.
- Contributor: Data centres should be included as contributors, with a contributorType of "DataCollector", "DataManager","Distributor", “Sponsor”, “ContactPerson” etc. as appropriate.
- Location: three different forms can be used and are listed below in order of preference.
- A bounding box (pair of latitude/longitude coordinates) containing all stations.
- A list of latitude/longitude coordinates, one for each station.
- A freeform geographic location name (e.g. “Chile” or “global”).
- Size: For a temporary network (closed experiment), the size of data in GB. For an ongoing or permanent network, an estimate of new daily data e.g. “500 MB/day”.
- Date:
- DateCollected:
- For a permanent network, the first day for which data was available, with a trailing slash to indicate an open date range, e.g. "1993-04-01/"
- For a temporary network, a range as YYYY-MM-DD/YYYY-MM-DD in the RKMS-ISO8601 standard e.g."2011-10-01/2013-05-31".
- DateAvailable: If there is an embargo on the data, this should be used to indicate when the embargo period ends.
- DateCollected:
- Related Identifiers: Other resources (published publications, scientific/technical reports, data, etc.) related to the seismic network with globally unique identifiers (e.g. DOI, Handle, PURL).
The group responsible for the administration and maintenance of the instrument pool (GIPP [2]) at the GeoForschungsZentrum (GFZ) has worked in the last years in the development of an internal system to keep track of the hardware components to be used in the seismic stations. Within the context of a BMBF project called Geo-Data-Node, GIPP and GEOFON plan to improve their internal workflows making this development interoperable with other internal and external systems.
Some of the functionalities to be exposed are:
- identification of hardware components by means of PIDs, with a reduced set of attributes on its PID record structure (to be discussed and based on experiences from other communities).
- technical specifications of deployed stations, identifying particular instances of the sensors and not only the type/model.
- provide updated responses for each stream (seismic time series), as well as the past ones, which are related to older data sets still in our archive.
- a journal of the different components can be made available through landing pages.
- where has been used?
- for how long?
- were there problems with it? how have they been solved?
- has it been recalibrated?
- generate Provenance data from the previous points.
Once this is accomplished, GEOFON could link the available datasets with the components, which have recorded the data and with the StationXML metadata describing them in detail. Also, provenance data generated could be linked to the datasets, offering the user more elements to evaluate the quality of the data.
Information on which stations were built during field trips could be extremely useful for early detection of problems and to find solutions. In an ideal case, the new deployments can be informed online, keeping a “live” view of the campaign.
ID = Numbering, 1, 2, 3 and sub properties with 1.1, 1.2 (akin to DataCite schema)
Property = Name of the property
Occurrence = 1 (mandatory) or 0-1 (optional at most once) or 0-n (optional multiple) or 1-n (at least one)
In metadata of = IP (Infrastructure Provider) or LP (Landing Page)
ID | Property | Occ. | Definition | Datatype | In metadata of |
1 | Identifier | 1 | Unique string that identifies the instrument instance | PIDINST | IP |
1.1 | identifierType | 1 | Type of identifier | Controlled list of values | IP |
2 | LandingPage | 1 | A landing page that the identifier resolves to | URL | IP |
5 | Manufacturer | 1 | The instrument's manufacturer | ??? | IP/LP |
6 | Description | 0-1 | Technical description of the device and its capabilities | Free Text | LP |
7 | InstrumentType | 1 | Classification of the type of the instrument | Controlled list of values | IP |
8 | VariableMeasured | 0-n | The variable(s) that this instrument measures or observes | Free Text | IP/LP |
9 | Date | 0-n | Dates relevant to the instrument | ISO 8601 | LP |
9.1 | dateType | 1 | The type of date | Controlled list of values | LP |
9.2 | Experiment | 0-1 | Experiment ID in which was used | Free text | LP |
9.3 | Location | 0-1 | Latitude and Longitude | 2 floats | LP |
9.4 | Calibration | 0-1 | Calibration needed for proper manipulation of the data | Binary | LP |
11 | AlternateIdentifier | 0-n | Identifiers other than the PIDINST pertaining to the same instrument instance | Free text but unique | IP/LP |
11.1 | alternateIdentifierType | 1 | The type of identifier | Free text (e.g. Serial Number) | IP/LP |
12 | Status | 1 | Status of the instrument. Only the current/last one from dateType (9.1) should be in the PID Record, but all in the Landing Page. | Controlled list of values | IP/LP |
13 | Documentation | 0-n | Different documents to help deployment, use, etc. | URL | LP |
Clark, A., Evans, P. L., and Strollo, A. (2014) FDSN recommendations for seismic network DOIs and related FDSN services. doi:10.7914/D11596.
Evans, P. L., A. Strollo, A. Clark, T. Ahern, R. Newman, J. F. Clinton, H. Pedersen, and C. Pequegnat (2015), Why Seismic Networks need Digital Object Identifiers, EOS, 96, doi:10.1029/2015EO036971.