-
Notifications
You must be signed in to change notification settings - Fork 5
Home
Arnold Kuzniar edited this page May 15, 2019
·
9 revisions
Objective: Generate a prioritized list of candidate genes from a QTL region based on phenotype information.
Output: Semantic integration of plant genomic and phenotypic data to enable ranking of candidate genes associated with fruit ripening in tomatoes (Solanum lycopersicum).
Fig 1. Biological entities and data flow. TODO: Add potato graphs.
Platform: Virtuoso Universal Server (OSE). The installation and deployment instructions can be found here.
Data sources:
-
Non-RDF sources
-
SGN: (wild) tomato genome annotations and genetic markers (detailed description here)
-
QTLs extracted from Europe PMC articles/tables.
-
-
RDF sources
- Ensembl Plants: tomato genome annotations (release 33)
- UniProt: (reference) tomato proteome (release 2016_11)
-
Database cross-references
Features:
- web-based Faceted Browser on Linked Data sets
- (Google-like)Text Search (e.g.
fruit quality
,Myb 12
,SGN-M6466
) -
Entity Label Lookup including genome, chromosome/location (
chromosome 11
), QTL (QTL:PMC4321030_4_1_54
), trait (fruit ripening
), genetic marker (variation gene231_0-i11
), gene symbol/ID (gene Solyc11g008770.1
), protein accession/ID (K4D5D7
), GO term/ID (GO:0009835
), pathway (carotenoid biosynthesis
) -
Entity URI Lookup (e.g.
http://purl.obolibrary.org/obo/TO_0002728
)
- (Google-like)Text Search (e.g.
- programmatic data access via SPARQL endpoint including some example queries & output
- Docker-ized Virtuoso server to easy on premise deployment
- automated data ingest & reconciliation procedures, which can aid in future updates of the platform when new releases of data sources become available
Current issues & limitations
- see this list of open (or closed) issues
- making tomato SGN data (in GFF) and QTLs from literature (in CSV) available in RDF requires manual effort aided by OpenRefine and a custom script in Python
- (non-)RDF data quality & curation (e.g. some Ensembl links to other resources)
- data licensing & re-use by private partners
Possible extensions:
- couple the Linked Data platform with an algorithm(s) to score/rank (candidate) genes associated with the trait of interest
- web interface including data visualization tailored to domain scientists