Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include physical object identifier such as inventory number #15

Open
huberrob opened this issue Feb 22, 2019 · 30 comments
Open

Include physical object identifier such as inventory number #15

huberrob opened this issue Feb 22, 2019 · 30 comments

Comments

@huberrob
Copy link

To allow to link between the physical object (instrument) and its digital representation, it would be good to have a property such as 'physicalObjectIdentifier' for Ids such as 'inventory number' etc..
The schema now has relatedIdentifier as well as alternateIdentifier. It is not clear which one should be used in case the 'real' physical object should be identified.

@RKrahl
Copy link
Member

RKrahl commented Feb 26, 2019

What do you mean by “link between the physical object (instrument) and its digital representation”? The instrument that the PIDINST pertains to is the physical object. I'm not sure if any “digital representation” of the instrument is within the scope of this group.

For inventory numbers, we already have alternateIdentifier. That is by the way the reason why alternateIdentifierType is free text rather then a controlled vocabulary: this property is intended to accommodate not only formal PIDs, but also less formal identifiers, such as inventory or serial numbers. The issuer of the PIDINST may thus specify in the alternateIdentifierType what kind of number the alternateIdentifier is supposed to be. E.g. X-institute-inventory-number or Y-manufacturer-serial-number.

For the difference between alternateIdentifier and relatedIdentifier: alternateIdentifier is for alternate identifiers, other then the PIDINST, pertaining to the same instrument instance. relatedIdentifier is for identifiers pertaining to other objects or entities that are related to this instrument instance, e.g. articles or extended metadata describing the instrument, other instruments related with this instrument, …

@huberrob
Copy link
Author

There is indeed a difference between identifiers used for digital objects and those connected to the physical object itself. If we keep the puristic way to subsume all these ids within alternateIdentifiers then we definitely should also provide a controlled list of identifier types. Otherwise there soon will be a chaos of self defined types..

@RKrahl
Copy link
Member

RKrahl commented Feb 27, 2019

But our identifiers do not pertain to digital objects, they pertain to instruments, e.g. physical objects.

@huberrob
Copy link
Author

Well, but our current definition states for
Identifier: Unique string that identifies the instrument instance
and for
LandingPage: 'A landing page that the identifier resolves to'
so in practise Identifier identifies the LandigPage which is a possible digital representation of the instrument.

@markusstocker
Copy link
Member

markusstocker commented Feb 27, 2019

I know @huberrob has made this point before but I never quite understood this. So, does an IGSN identify a sample (i.e., the physical object) or the landing page (i.e., a possible digital representation of the sample)? IMO, the case of instruments can be treated same.

I am not entirely convinced that a DOI cannot identify a physical object and resolve onto a possible digital representation of the physical object. After all, the DOI name is a quite distinct thing compared to the landing page URL. Why can we not threat the URL as the identifier of the possible digital representation of X and the PID as the identifier of X (digital or physical)?

I guess one counter argument is that a PID for physical object X could resolve onto one of many digital or physical representations (say, a digital document or a printed document representing metadata about the physical object X). Can we say that the PID identifying physical object X resolves to and only to a preferred landing page and thus digital representation? In other words, there is a one-to-one mapping but the PID name and the landing page URL (can) identify two different things?

Any more thoughts?

@huberrob
Copy link
Author

IGSN now are attached to samples by early IGSN adopters, e.g. using a barcode label. This definitely is a strong, physical link to this identifier and a good practice. But it took years until this was achieved.
In reality, from the issuing institutions perspective, a PIDINST will rather represent another alternateIdentifier instead of the Identifier and will be used in most institutions in addition to their internal numbers. Therefore, we must make sure that we enable a solid way to link the physical object and its main identifier with its digital representation and its PIDINST.

As I mentioned before, one way to ensure this, is to express this as a dedicated property. Alternatively we can set up a controlled list of types for alternativeIdentifierType
We have to decide what we want.

@RKrahl
Copy link
Member

RKrahl commented Mar 4, 2019

Sorry, @huberrob, I really tried hard, but I don't get your point. Again: the instrument PID that we are discussing here does not identify any kind of digital object or a landing page, it does identify the instrument, e.g. the physical object. The landing page that the identifier resolves to is just a mean to provide more information about the instrument. It is part of the metadata in the same way as any property in this schema belongs to the metadata pertaining to the instrument.

This is by the way the same as for most kinds of identifiers:

  • The DOI pertaining to an article in a journal resolves to a landing page at the publisher. This landing page provides supplementary information on the article, such as title, abstract, authors, crossrefs to other articles citing this one, information on how to get the article, … Still the DOI identifies the article, not the landing page. The landing page may change and move to a different server, even with another publisher, but the DOI stays the same and the article stays the same.
  • Same for DOIs for data publications: the DOI (usually) resolves to the landing page, not to a download link for the dataset. But the DOI identifies the data, not the landing page.
  • ORCID-iDs resolve to a page at ORCID, often providing information about the researcher. But the ORCID-iD is about people, not about webpages at ORCID. This is also not affected by the fact that there may be more identifiers pertaining to the same person, such as a Scopus Author ID or a social security number.

If there is a need to attach barcode labels with the PIDINST to the instrument or not is out of the scope of this WG, I'd say. At least for our instruments, I can say that they are big enough and difficult to move, so that it is really hard to overlook or to loose them. I don't think we will need such barcodes.

The same instrument that is identified by a PIDINST will certainly have more then only this identifier: inventory number in the owning institute, an entry in some instrument database with it own database id, serial number of the manufacturer, and so on. That is what AlternateIdentifier is for, because it makes sense to create links between the identifiers in order to be able to check which item in the institutional inventory database has been attributed which PIDINST. Which of these is the identifier and which is yet another identifier will depend on the use case and perspective and may change from minute to minute. If I search our internal instrument database I might need the database identifier, so at this moment this will be the identifier for me. Ten minutes later, I might prepare a data publication and link that to the instrument that produced the data. At that instant, the PIDINST will be the identifier that is relevant.

Most of these other identifiers are not formalized PIDs. Therefore it is not possible to enumerate all identifier types that will be used for AlternateIdentifier. And that is why it is and should be free text rather then a controlled list of values. If I put <alternateIdentifier alternateIdentifierType="URL">http://www.example.org/someurl/</alternateIdentifier> in the metadata of my instrument, you will most probably know how to follow this identifier. If I put <alternateIdentifier alternateIdentifierType="IGAMA number">1848</alternateIdentifier> there, you will most likely not know what it is, so you will at least be able to guess that this piece of information is not relevant for you. But other people do know what it is and for those people it is useful information. Both entries may be valuable depending on the use case.

@RKrahl
Copy link
Member

RKrahl commented Mar 6, 2019

We discussed this in today's meeting:

  • we agreed that the PIDINST identifies the instrument, e.g. the physical object, not a digital representation.
  • assuming that the suggestion to amend the schema was based on the misconception of what the PIDINST is supposed to identify, we decided to close the issue.

@huberrob: if you believe that there is still an issue with the schema after this clarification, feel free to reopen.

@RKrahl RKrahl closed this as completed Mar 6, 2019
@huberrob
Copy link
Author

huberrob commented Mar 7, 2019

Dear all,

I would like to reopen this issue and maybe we should merge it with #5 on 'serial numbers' proposed by @louatbodc as both issues are dealing with the imho essential desired capability to link a PIDINST and associated metadata record with the physical object which is not sufficiently adressed now by alternateIdentifier .

@RKrahl states a PIDINST would already identify the instrument but if this is still illusionary.
Instead it is most likely that PIDINST will be used in addition to existing instrument identifiers only in order to have an effective way to link digital records with instrument representations.

Even our schema reflects this, otherwise we would not have defined LandingPage as mandatory property.

We should not ignore, widely used existing identifiers for instruments, which in most cases actually are physically attached to the instrument: serial numbers as well as inventory numbers (or accession numbers).
Btw. these identifiers are also regarded to represent essential information by standards like TEDS / IEEE 1451.4 (not sure if we have discussed this yet, I will open an issue..)

Imagine a larger institution which has hundreds of instruments in use such as the AWI which actually owns e.g. dozens of CTD sensors of the same type e.g. type SBE-37. In our current architecture all these would receive an own PIDINST and most probably would all have the same name like 'SBE-37'. Ideally these instruments would have an own landing page. But what if this is not done like we imagine this now? And instead, the landing page is just a HTML page which presents a list of instruments like:

SBE-37 PIDINST-1
SBE-37 PIDINST-2
SBE-37 PIDINST-3
etc.

How should one be able to identify a distinct instrument and e.g. find it in the shelves of this institutions based on this information? But this would be easy if we would provide an unambiguous way to include a serial number or inventory number in the PIDINST metadata.

Using alternateIdentifier is not the appropriate solution for inventory numbers or serial numbers. We all know how difficult it is to maintain consistency in using metadata schemas.
People will use alternateIdentifier as well as relatedIdentifier to fill in serial numbers and they will use 'serial number', 'Seriennummer', 'ser. number', 'S/N', 'SRID', 'product key', 'serial key' etc. etc.. for this purpose. Do we really want this?

This is why I think we either need dedicated elements for serial numbers and inventory numbers or a dedicated generic element for physical identifiers and a controlled list of types at least for these.

best regards,
Robert

@RKrahl
Copy link
Member

RKrahl commented Mar 7, 2019

I'll reopen as requested by @huberrob. I will comment, maybe next week.

@markusstocker
Copy link
Member

My thoughts here:

@RKrahl states a PIDINST would already identify the instrument but if this is still illusionary.
Instead it is most likely that PIDINST will be used in addition to existing instrument identifiers only in order to have an effective way to link digital records with instrument representations.

I think we do not pretend that PIDINST will be the identifier for instruments. I completely agree that it will be one among others. As you correctly note, it will be the one that in contrast to others enables resolution of the identifier to further data about instrument on the web. However, and this is important, we do claim that PIDINST identifies the instrument, the physical object, not the metadata attached to the identifier, not the landing page or data returned on that landing page. Whether the PIDINST will be attached to the instrument, with a barcode, engraved in the instrument case, or not at all is IMO irrelevant or left at the discretion of the instrument owner. The PIDINST understanding is that the identifier identifies the instrument, the physical object.

Now, I understand this can be controversial. Indeed, it is related to the controversy whether a DOI is a digital identifier for (digital or physical) objects or an identifier for digital objects. This is all a bit philosophical IMO and not even the PID community has a clear answer. I am also unclear about the practical implications. Hence, unless we clarify why a string "10.123/abc" cannot identify a physical object (as IGSNs do) I suggest we continue with the position the PIDINST indeed identifies the instrument, the physical object.

We should not ignore, widely used existing identifiers for instruments, which in most cases actually are physically attached to the instrument: serial numbers as well as inventory numbers (or accession numbers).

The schema doesn't ignore them and provides a mechanism to explicitly include them (alternateIdentifier).

Imagine a larger institution which has hundreds of instruments in use such as the AWI which actually owns e.g. dozens of CTD sensors of the same type e.g. type SBE-37. In our current architecture all these would receive an own PIDINST and most probably would all have the same name like 'SBE-37'. Ideally these instruments would have an own landing page.

Would all have the same model name but, yes, correct.

But what if this is not done like we imagine this now? And instead, the landing page is just a HTML page which presents a list of instruments.

I don't think we can enforce this or I don't see how. A more or less weak parallel may be that DOIs are used to identify collections of documents, and resolve on collection landing pages. Each item in the collection may or may not have a DOI. I agree that this is less of a problem for DOIs, since DOI goes for any digital object (atomic or collection). If we argue that PIDINST identifies an instrument and someone registers a PIDINST to identify a digital collection of references to physical objects then this would constitute a misuse of the PIDINST identifier. For such cases, my suggestion would be to identify the collection with a DOI and the collection elements with PIDINST (note that I am not excluding the possibility that PIDINST = DOI).

How should one be able to identify a distinct instrument and e.g. find it in the shelves of this institutions based on this information? But this would be easy if we would provide an unambiguous way to include a serial number or inventory number in the PIDINST metadata.

Using alternateIdentifier.

Using alternateIdentifier is not the appropriate solution for inventory numbers or serial numbers. We all know how difficult it is to maintain consistency in using metadata schemas.
People will use alternateIdentifier as well as relatedIdentifier to fill in serial numbers and they will use 'serial number', 'Seriennummer', 'ser. number', 'S/N', 'SRID', 'product key', 'serial key' etc. etc.. for this purpose. Do we really want this?

I think here we get to the meat. To your question I say, no we don't want this. But do we need a dedicated attribute serialNumber or can we address this by having a defined set of alternateIdentifierTypes? Would your concern be addressed with the following:

<AlternateIdentifier alternateIdentifierType="SerialNumber">123123123</AlternateIdentifier>

whereby SerialNumber is a defined string out of a closed collection?

This of course opens the issue on how we come up with that close collection but I wonder if this can be addressed by schema versioning and adapt the collection following community requests.

@huberrob
Copy link
Author

huberrob commented Mar 8, 2019 via email

@markusstocker
Copy link
Member

Agreed, let's see what additional input rolls in.

I suggest we take some of these key open issues onto the agenda for P13 (@RKrahl).

have you seen the issue I created yesterday regarding TEDS / IEEE 1451.4 (#20) ?

I did, thanks, but I am not yet clear what to do with it. ;> For instance, I am not sure how widely used are the properties Model Number, Version Letter, Version Number. These didn't show up in our use cases (except Version Number in the use case by FZJ).

Further, it is not clear to me why we are so picky with identifiers and why we are so desperately trying to avoid dedicated properties for IDs.

The rationale is that with dedicated properties as well as the alternateIdentifier approach for some information there will be two ways to encode the same information, actually three. Say we have serialNumber as an additional property then one can: Use it for the serial number; use alternateIdentifier for the serial number; or use both. Intuitively I would argue this is not desirable.

Manufacturer and Owner are explicitely modelled in the schema as dedicated properties

Correct and I would argue this is because these are complex properties with name, id, contact.

I include Christoph in this email

I think it may not have been delivered to him, since your reply went to GitHub. Not sure.

@huberrob
Copy link
Author

huberrob commented Mar 9, 2019 via email

@RKrahl
Copy link
Member

RKrahl commented Feb 26, 2021

I believe we have a consensus by now that the inventory number should be included as AlternateIdentifier. I suggest to close this one. Note that #24 is still open.

@huberrob
Copy link
Author

huberrob commented Mar 1, 2021

I do not agree to close this issue. I continue to see a specificity in the relationships between a physical object (via serial number, inventory number), and its digital representation. In my opinion this still is not appropriately expressed in the current model. As possible solutions I could imagine to change the scope of PIDINST to also include 'virtual instruments' or if we would provide a mandatory alternateIdentifierType vocabulary and its cardinality would be changed 1:n (forcing to include at least one identifier for a physical object).

@huberrob
Copy link
Author

huberrob commented Jul 7, 2021

See also how the DISSCO community deals with 'digital specimen' see here https://github.com/DiSSCo/openDS/ and the recently published paper: https://doi.org/10.3897/rio.7.e67379

@hardistyar
Copy link

It is true that you can use a PID (such as a DOI) to identify both/either physical and/or digital objects. Such a PID resolves to the location of the object.

In the case of a digital object that location can be the location (repository on the Internet) where the digital object is stored - typically in the Web world a URL that can be used with HTTP requests to obtain the object, or a landing page about it. Think about journal articles where most DOIs resolve to a landing page that 'tests' whether you have a subscription to the journal before giving you the text (of the article object that has been identified).

In the case of identifying a physical object, such as an instrument or sample the resolution process must provide sufficient for you to accurately locate the object. There are at least two ways you can do this. So in the case of multiple type SBE-37 instruments as an earlier comment mentioned, i) the resolution of the PIDINST must include the physical location of the specific instance of the instrument identified e.g., the instrument serial number, the number of the shelf/cupboard it is stored on/in and even the position on the shelf if necessary to disambiguate one instrument or sample from another. It will also be helpful if the instrument/sample itself has label with its PIDINST identifier permanently attached; or ii) a URI consisting of (for example) the institution domain and the instrument serial number. Mapping of that to present cupboard/shelf location is then a local responsibility.

In the object concept, an object can either be a single object instance or it can be collection of several object instances, some physical, some digital. So, you could give a single PIDINST to all your instances of (a pool of) type SBE-37's if it doesn't matter which one gets used, or to a collection object that contains both the physical object and a digital object corresponding to that.

It all depends on how you choose to use the PID system you've chosen. In DiSSCo we will ensure at least the one-way association between a 'Digital Specimen' and its physical specimen counterpart in a museum cupboard. We will do this by including an 'institution code' and a 'physicalspecimenId' in the PID Record maintained by the Handle System for the Digital Specimen (DS) object in question. The data in the DS will also include either a URI as I described above (combination of institution domain and physicalspecimenId) or an IGSN - whichever is used by the institution.

I hope this helpful.

@huberrob
Copy link
Author

huberrob commented Jul 7, 2021

Thank you very much @hardistyar!

So I assume physicalspecimenId and institution code are mandatory?

Robert

@hardistyar
Copy link

They are mandatory minimum pieces of information (along with a few others) that will be needed to publish DS information.

@smrgeoinfo
Copy link

seems like some rehashing of the old httprange-14 discussion.
The binding between an identifier string and a physical thing has use some identifying property carried by the physical thing; a simple example is a unique serial number stamped on the thing or attached as a permanent label.
The physical thing can't be sent over the wire, so in HTTP protocol, there is a 303 redirect to get a digital representation of the physical thing (e.g. a 'digital specimen'). This representation must include necessary information (e.g. a serial number) to bind the digital thing to the physical thing. This digital representation is itself a different resource, and should have its own unique identifier; this allows making statements about the representation distinct from statements about the physical thing. There can be many digital representations of a physical thing; HTTP includes content-negotiation functionality to get a particular representation.

@markusstocker
Copy link
Member

Nice to see you here @hardistyar - quick question since I have not understood this from your comment: Are you suggesting that sufficient information to accurately locate the physical object (here the instrument) should be PID metadata or can this information be on the landing page the PID resolves to?

@RKrahl
Copy link
Member

RKrahl commented Jul 9, 2021

@huberrob, I still believe that your distinction between the "physical object" and some "digital representation" of it (whatever that is supposed to mean) is artificial, it make things needlessly complicated and it does not seem provide any practical benefit. We discussed this several times in the working group meetings and agreed that the instrument PID identifies the instrument, the physical object, not the metadata attached to the identifier, not the landing page or data returned on that landing page. We don't need any additional "physicalspecimenId" because the instrument PID is exactly that.

@RKrahl
Copy link
Member

RKrahl commented Jul 9, 2021

Regarding the information to locate the instrument, we discussed that in #17 and finally agreed not to add any additional information such as geo coordinates. For most instruments, the street address of the Owner will be sufficient to locate it. For other instruments that are deployed in the field or used during an expedition, it would be rather challenging to describe the location by simple attributes and the location might be too volatile to be included in the PID record. For the latter case, we agreed to consider some sort of a WasUsedIn relationType so that one could point to the deployment as a related identifier.

@smrgeoinfo
Copy link

@RKrahl see #15 (comment) for rationale for having an identifier for the physical object distinct from identifiers for a 'digital representation' of that object.

@RKrahl
Copy link
Member

RKrahl commented Jul 12, 2021

We already do have a place in the schema to include serial numbers and inventory numbers: AlternateIdentifier.

@RKrahl
Copy link
Member

RKrahl commented Jul 30, 2021

In the preparation of submitting the schema as a RDA recommendation, we plan to get a decision on all pending open questions during the next monthly meeting on 4th August.

Since I can't even recognize a clear proposal for a concrete change to the schema in this issue, I suggest to close it.

@huberrob
Copy link
Author

huberrob commented Aug 2, 2021

Apparently no clear consensus exists how to deal with the 'physical thing' in PIDINST. The proposal was to include something like a 'physicalspecimenId' (or physicalinstrumentId) which clearly (and replicable) indicates the instrument actually exists or existed in the real world. You can ignore this but this will lead to the introduction of a very large number of PIDINST and associated digital objects for which it will not be possible to prove whether they ever really existed.

@RKrahl
Copy link
Member

RKrahl commented Aug 3, 2021

Why would anybody want to create a PID for an instrument that does not exist? I would assume that any PIDINST is associated with a really existing physical instrument or with one that did exist in the past. And even if some people would have the weird idea of attributing PIDINST to non existing instruments, how would a "physicalspecimenId" property in the schema prevent those people from doing that. So in which way would the addition of such a property make any difference here?

Furthermore, I have to disagree with the statement that no clear consensus exists how to deal with the 'physical thing' in PIDINST. At the risk of repeating myself: we discussed this several times in the meetings and we always agreed in the group that the instrument PID identifies the instrument, the physical object, not the metadata attached to the identifier, not the landing page or data returned on that landing page. There is no such thing as a digital object associated with a PIDINST.

Also, I still don't know what you mean by a 'physicalspecimenId' or how would such a property be defined.

@huberrob
Copy link
Author

huberrob commented Aug 3, 2021

@RKrahl

Furthermore, I have to disagree with the statement that no clear consensus exists how to deal with the 'physical thing' in PIDINST. At the risk of repeating myself: we discussed this several times in the meetings

Ok, I always assumed all group members are allowed to give input. But if you think you can decide this on your own go ahead..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants