-
Notifications
You must be signed in to change notification settings - Fork 3
External data
Data used for the modeling (generally, experimental data) should not be stored directly in the mmCIF files, but instead should be deposited in a suitable repository and linked to, for several reasons:
- size: some experimental datasets are extremely large, so it's not efficient to store them in a text-based format like mmCIF.
- existing standards: little point in developing a new format to store something when an existing format has wide adoption (e.g. MRC format for EM maps).
- duplication: little point in duplicating data that's already available elsewhere.
- domain expertise: experts in each experimental field are better qualified to determine the file formats, database structure, etc.
Where an existing repository isn't available, it is possible to deposit files somewhere and obtain a DOI (for example, Zenodo), but this should be considered a temporary measure until a database is established.
Modeling generally uses processed data (for example, an EM map). Where possible, both the processed data and the original raw data (for example, a set of EM micrographs) should be deposited somewhere.
The state of each experimental field is summarized below.
File formats: MRC
Data linked from mmCIF: raw micrographs, 2D class averages, 3D maps
Repositories:
Data linked from mmCIF:
Repositories: BMRB
File formats: SAS profiles, SASCIF
Data linked from mmCIF: SAXS profile, ab initio shape
Repositories: SASBDB
File formats: simple tabulated data, mzIdentML version 2.0 (in preparation, draft format)
Data linked from mmCIF: tabulated sets of proximate residues (e.g. for the yeast Nup84 complex), spectra/peaklists (e.g. for the yeast Mediator complex)
Repositories:
- Sets of proximate residues: none (?)
- Peaklists: ProteomeXchange, MASSIVE, PRIDE
File formats: Photon-HDF5 (see also the FRETBursts software)
Data linked from mmCIF: pairs of interaction sites?
Repositories: none (?)