Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2 - Test Specifications and definitions #174

Open
Tasilee opened this issue Oct 1, 2018 · 9 comments
Open

TG2 - Test Specifications and definitions #174

Tasilee opened this issue Oct 1, 2018 · 9 comments

Comments

@Tasilee
Copy link
Collaborator

Tasilee commented Oct 1, 2018

Field Value
GUID A globally unique identifier that would resolve to related information about the test and assertion, e.g., e39098df-ef46-464c-9aef-bcdeee2a88cb
Label A standardized name of the test-assertion based on the template OUTPUTTYPE_TERMS_RESPONSE, e.g., "VALIDATION_BASISOFRECORD_NOTSTANDARD". These names were considered helpful for human-human communication and to assist with code implementation, maintenance and searches.
Description A concise description of the AMENDMENT or MEASURE, e.g., The value of dcterms:license was standardized using the bdq:sourceAuthority
Output Type All tests have been classified into four classes: VALIDATION (tests values in one or more Darwin Core terms and returns either “COMPLIANT” or “NOT_ COMPLIANT”, e.g., “VALIDATION_BASISOFRECORD_NOTSTANDARD” would return “COMPLIANT” if dwc:basisOfRecord=”Preserved specimen”); AMENDMENT (flag that a change has been made to at least one Darwin Core term in the record, e.g.,  “AMENDMENT_COORDINATES_TRANSPOSED” where dwc:decimalLatitude and dwc:decimalLongitude values have been reversed);  NOTIFICATION (flags where a term is “NOT EMPTY”, e.g., "NOTIFICATION_DATAGENERALIZATIONS_NOTEMPTY" when dwc:dataGeneralized contains some value or text); MEASURE (returns a number of tests conforming to a criteria, e.g., “MEASURE_VALIDATIONTESTS_NOTCOMPLIANT” returns the number of tests of type VALIDATION that returned “NOT_COMPLIANT”)
Expected response A concise description of the expected response in the form of INTERNAL_PREREQUESITES_NOTMET if there is no value or if a specified source such as a vocabulary is not available; if the test fails, it is NOT_COMPLIANT, otherwise it is COMPLIANT, e.g., VALIDATION_DAY_NOTSTANDARD: Expected response is INTERNAL_PREREQUISITES_NOTMET if there is no value for dwc:day; NOT_COMPLIANT if the value of dwc:day is not an integer between 1 and 31; otherwise COMPLIANT
Darwin Core Class The Darwin Core Class that the test relates to, e.g., Taxon
Information Elements The Darwin Core terms that the test relates to, e.g., dwc:taxonRank
Dimension The focus of the Darwin Core terms used in the test, either NAME, SPACE, TIME or OTHER
Data Dimension A test will focus on one of the following scenarios based on the Data Quality Framework: "Completeness" (the extent to which data elements are present and sufficient, e.g., "VALIDATION_TAXONID_EMPTY"); "Conformance" (Conforms to a format, syntax, type, range, standard or to the own nature of the information element, e.g., "VALIDATION_YEAR_NOTSTANDARD"); "Consistency" (Agreement among related information elements in the data, e.g., "VALIDATION_EVENTDATE_INCONSISTENT"); "Likeliness" (low probability that values are real, e.g., "VALIDATION_COORDINATES_ZERO"); "Resolution" (Is sufficient detail present in the value/s - a measure the granularity of the data, e.g., "VALIDATION_DATAGENERALISATIONS_NOTEMPTY")
Term-Actions The part of the Label that specifies the focus Darwin Core term and the action applied to it
Warning Type Warning assertion resulting from running a test, one of: Ambiguous, Amended, Incomplete, Inconsistent , Invalid, Notification, Report, Unlikely.
Parameters(s) If there are options for bdq:sourceAuthority or values used in the test, they are specified here
Example A concise example of the application of the test, e.g., dwc:taxonRecord="sp." becomes dwc:taxonRank="Species"
Source The origin of the concept of the test, e.g., TDWG2018
References One or more publications that relate directly to the test, e.g., http://rs.gbif.org/vocabulary/gbif/rank.xml
Example Implementations (Mechanisms) A link to one or more agencies that have an implementation of the test, e.g., #86: "Kurator:event_date_qc"
Link to Specification Source Code A link to reference code set that demonstrates the test, e.g., #86: https://github.com/FilteredPush/event_date_qc/blob/5f2e7b30f8a8076977b2a609e0318068db80599a/src/main/java/org/filteredpush/qc/date/DwCEventDQ.java#L169 A minimum set of unit tests is at: https://github.com/FilteredPush/event_date_qc/blob/5f2e7b30f8a8076977b2a609e0318068db80599a/src/test/java/org/filteredpush/qc/date/DwcEventDQTest.java#L310 see also unit tests for underlying implementation at https://github.com/FilteredPush/event_date_qc/blob/5f2e7b30f8a8076977b2a609e0318068db80599a/src/test/java/org/filteredpush/qc/date/DateUtilsTest.java#L460 and https://github.com/FilteredPush/event_date_qc/blob/5f2e7b30f8a8076977b2a609e0318068db80599a/src/test/java/org/filteredpush/qc/date/DateUtilsTest.java#L616
Notes Additional comments that TG2 believed necessary for an accurate understanding of the test or issues that implementers needed to be aware of, e.g., The Taxonomic Rank GBIF Vocabulary has an extensive list of Ranks including synonyms in a number of languages.
@Tasilee Tasilee changed the title TG2 - Test parameter definitions TG2 - Test parameters and definitions Oct 1, 2018
@ArthurChapman
Copy link
Collaborator

Lee - I have altered ACTION to RESPONSE as earlier discussed

@ArthurChapman
Copy link
Collaborator

Note @Tasilee - we have just "Description" in Amendments and Measures - not a "Pass Description" and a "Fail Description"

@Tasilee
Copy link
Collaborator Author

Tasilee commented Oct 1, 2018

I realised that we didn't have specifics of the subset of terms used within the TG2 test issues. This is a placeholder. In doing this draft, inconsistencies in the labels and with the TG1 Framework became apparent.

For example, some but not all Darwin Core terms are hyphenated in the Term component of the labels. e.g., #55, #56, #68 and #107 have been edited for consistency. There are however, also inconsistencies with those tests that have the (now called) "Response" value "FROM_XXXYYY" where "XXXYYY" can be one or more Darwin Core terms. I suggest for consistency, we use "FROM-XXX-YYY"?

Regards TG1-TG2, there is no "Likeliness" in the Framework DQ Dimension and some of the definitions are more global than the TG2 context. Please edit the table above if you think there are better definitions and/or examples.

@Tasilee
Copy link
Collaborator Author

Tasilee commented Oct 1, 2018

@ArthurChapman regards "Pass description"/"Fail description", yep, I know. This was just a draft that I needed to Save before it got too complex. I also need to add an example "Example implementation" and "Link to specification code": They were hard to find - so we may need a few more TAGs that designate "Done X"

@chicoreus
Copy link
Collaborator

@Tasilee great to have definitions of these.

Also see the (trivial, special purpose) code (at https://github.com/kurator-org/bdq_issue_to_csv/blob/master/src/main/java/org/kurator/issueconverter/BDQConvert.java for manipulating these tables in the github issues into CSV suitable for alignment with the RDF representation of the framework.
https://github.com/kurator-org/kurator-ffdq/blob/master/competencyquestions/rdf/ffdq.owl
Note that any change to the fields listed here in the issues will require corresponding edits to the code, and that the alignment to the framework should also inform these definitions.

Note that InformationElement is singular in the framework, even though an InformationElement may be comprised of a list of DarwinCore terms. DarwinCore class is in effect a category for the InformationElement.

It is probably worth separating the examples into a separate column, and adding a column to map these values to framework concepts (e.g. Information Elements = InformationElement, Data Quality Dimension = Dimension).

@chicoreus
Copy link
Collaborator

chicoreus commented Oct 1, 2018

What we probably need, rather than (or in addition to) this, is the definitions of the column headers in the spreadsheet produced by the BDQConvert code (as that is bringing the tests into closer alignment with the framework).

@ArthurChapman
Copy link
Collaborator

@chicoreus - again we seem to have a problem with Framework Definitions and having multiple documents with conflicting information. the Glossary at https://tdwg.github.io/bdq/tg1/site/glossary.html which I and others thought was the current up-to-date version has DQ Dimension not just Dimension and Information Element not InformationElement

@tucotuco
Copy link
Member

tucotuco commented Oct 1, 2018

I am really glad to see this coming together. Reviewing it keeps us on our toes. It made me ponder a number of issues when I think about taking the information provided and turning it into code. I made an issue #175 to discuss the merits of defining expected responses for the tests.

@Tasilee Tasilee changed the title TG2 - Test parameters and definitions TG2 - Test Specifications and definitions Mar 23, 2022
@Tasilee
Copy link
Collaborator Author

Tasilee commented Mar 23, 2022

In discussion with @ArthurChapman this morning, I decided to update the specifications above to match the current implementation. The terms and definitions should also conform to the Vocabulary #152.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants