-
Notifications
You must be signed in to change notification settings - Fork 7
Taxonomic Inference: 2. Creating a Trait Data Set
Trait data suitable for taxonomic inference are available in many scholarly works. Taxonomic treatments and literature reviews for a given set of traits are particularly useful, but other studies and even textbooks can also be good sources of information about traits that characterize an entire clade or a defined set of subclades. When extracting trait data for branchpainting, be careful to correctly interpret the trait distribution described in the literature source. Don’t use taxonomic inference to assign the most common or most prominent trait value to an entire clade if this value is not expressed in every single member of the clade. Only use taxonomic inference if it is clearly stated in the source that ALL members of a clade have a certain trait value, or if the full set of relevant stop nodes can be assembled using information from the original source of the start node or from additional literature sources.
Always make sure that all your stop nodes for a given parent record are descendants of the start node in the EOL dynamic hierarchy (the first hierarchy listed in the names tab of EOL pages) since this hierarchy will be used for the painting of branches. Also, check your data set for duplicate or conflicting trait assertions. For example, if you have a particular trait value at the family level, you don’t need to have the same trait value for a genus in that family. If a descendant taxon has its own trait value that is different from that of its start node, check if you need to add a stop node for that descendant or a relevant ancestor. This won’t always be necessary since some traits can have multiple values for a given taxon. But it is always worth double-checking taxa that have multiple data records for the same trait.
Taxonomic inference trait data sets follow a modified DarwinCore Archive format, with trait data provided in a Measurement or Fact extension. Start and stop nodes for branchpainting are entered as child measurements of a conventional trait data record for the basal node of the branch to be painted. For an example of a taxonomic inference data set, see Hunt et al, 2009.
A spreadsheet template is available to facilitate the manual creation of taxonomic inference data sets using a flat table format. Once completed, the template is uploaded to the Trait spreadsheet to DwC-A Tool which converts the data to a DarwinCore Archive with relevant extensions. Archives will then be available in the EOL Open Data repository, for download or for harvest into EOL.
- taxon name – Enter the scientific name of the taxon as used in the literature source.
- eolID – Enter the EOL page ID that represents the basal node (start node) of the branch to be painted with the trait value of this record. For example, to add a data record for the family Sciuridae, use 8703 as the eolID value. To avoid mismappings due to homonyms, make sure the taxon you are mapping is compatible with the higher classification path of the EOL page. You can do this check manually by examining the names tab of the EOL page or you can get the higher classification by submitting an ancestry query through the Cypher Query Form or the web services API.
- predicate – Enter a predicate label, e.g., behavioral circadian rhythm, flower color, habitat. This represents the measurement type for the trait data record. See the vocabulary sheet of the spreadsheet template for a list of available predicates.*
- value – Enter a value label, e.g., nocturnal, yellow, freshwater. This represents the measurement value for the trait data record. See the vocabulary sheet of the spreadsheet template for a list of available values.*
- inherit – Enter yes for all data records representing basal nodes of branches that should be painted, i.e., where you want the trait value to be inherited by descendant taxa.
- stops at – Enter the eolID (see above) of the EOL page that represents a taxon which you want to exclude from the branchpainting (stop node). Stop nodes must always be descendants of the start node (eolID of the parent data record). For example, if you enter a behavioral circadian rhythm:diurnal trait record for squirrels (Sciuridae, eolID 8703) but don’t want to have that value propagated to the nocturnal flying squirrels (Pteromyini, eolID 4467455), you would enter 4467455 as the stops at value for the Sciuridae trait data record and create a separate behavioral circadian rhythm:nocturnal data record for the Pteromyini. A data record can have multiple stop nodes, e.g., you can exclude multiple tribes, genera, or species from the trait data record of a family. If you list multiple stops at values, please separate them with a pipe, e.g., 23452|79899|245445.
- bibliographicCitation – Enter the bibliographic citation for the scientific work that serves as the source for the trait data record.
- source – Enter the source of the trait data record. This will usually be a DOI or URI that links directly to the scientific work that serves as the source for the trait data record. If a DOI is entered, a bibliographicCitation is not required, but having both values available makes the trait data record more transparent for data users.
- kingdom, phylum, family – Enter optional information about the higher classification of the taxon. Ideally, all taxa have eolID values, and the higher classification is not necessary to ensure proper taxon mapping during the data harvest. However, listing the higher classification in the data set may help the creator of the resource to keep track of intended taxonomic mappings in case of homonyms.
- units – Enter a units label, e.g., mm, kg, days. This represents the measurement unit for a quantitative trait data record. Quantitative traits generally show too much variation within clades to be suitable for branchpainting. However, taxonomic inference may be appropriate for a few quantitative traits that are more likely to be phylogenetically conserved, e.g., generation time. See the vocabulary sheet of the spreadsheet template for a list of available units.*
- statistical method – Enter a statMeth label, e.g., mean, median, min, max. This represents the statistical method used to arrive at the trait value of a quantitative trait. See the vocabulary sheet of the spreadsheet template for a list of available statistical methods.*
- sex – Enter a sex label, e.g., male, female, hermaphrodite. Use this column if the trait record applies only to a specific sex of the taxon. See the vocabulary sheet of the spreadsheet template for a list of available sex values.*
- lifestage – Enter a lifestage label, e.g., adult, egg stage, larval. Use this column if the trait record applies only to a specific life stage of the taxon. See the vocabulary sheet of the spreadsheet template for a list of available sex values.*
- measurementRemarks – Enter comments or notes about the trait data record.
- measurementMethod – Enter a description of the methods or protocols used to determine the trait data value or provide a reference (bibliographic citation or uri)
- referenceID – Enter the referenceID values of references listed in the references sheet of the spreadsheet template. Use this column for additional references that are cited for a given data record in the literature source. The references sheet has a simple structure with only two columns. The referenceID can be in any format, e.g., integers (1,2,3) or brief citations (Miller 2020, Cortez et al. 1988, Ozerov 2004b), as long as the identifier for each reference is unique. The fullReference value can be in any style, but it should provide all the information necessary to track down the cited work.
- personal communication – If the data record relies on personal communication by a researcher rather than a literature source, enter a link to that person’s ORCID profile. For researchers without an ORCID profile, you can also enter the person’s name and institution or you can create an item for them on WikiData and provide a link to this item.
* Please note that the EOL terms vocabulary is in constant flux, and the vocabulary sheet of the spreadsheet template may not list all available terms for predicate, value, unit, statistical method, sex, and life stage. Refer to the EOL Terms Glossary for additional term options. If a term is listed in the glossary but is not represented in the vocabulary sheet, it can be added manually to the sheet. Be sure to add both the label and the uri in the columns representing the relevant term categories. If a new term needs to be added to the EOL vocabulary, please contact EOL staff.