After iPRES 2019, we looked at ways we as a community to contribute to digital preservation efforts. We decided to focus on addressing gaps in PRONOM by organising regular PRONOM Research Weeks. During the fortnight of 5-19 October 2020, volunteers are encouraged to help with PRONOM’s research backlog. You can enhance documentation, supply sample files, or create a signature, among other things.
Additional resources to support research activities will follow.
- PRONOM can be found here: www.nationalarchives.gov.uk/PRONOM
- A list of blogs, presentations, and other resources to assist with PRONOM research and file format signature development can be found here
- If you would like help or advice on conducting your research, crafting your submission, creating a signature, or if you’re having difficulty finding samples, please create a conversation thread on our Google Group
- Here is the full list of everything in PRONOM as of the v97 release on 1st October 2020 (https://github.com/digital-preservation/pronom-research-week/blob/master/v97_master_list.csv). From here you can see which formats don't currently have MIME/Media types, lists of associated extensions, deprecated formats and formats that have signatures (including container signatures).
- Here is a list of PUIDs that currently lack signatures (https://github.com/digital-preservation/pronom-research-week/blob/master/formats_without_signatures_Oct_2020.csv). You can help by sourcing example files and suggesting potential identification signatures.
- Here is a list of PUIDs that currently only have an 'outline' description (https://github.com/digital-preservation/pronom-research-week/blob/master/formats_with_outline_descriptions_only.csv). You can help by suggestion descriptive text for the PRONOM entry. Descriptions should be objective desciptions of what the format does and can include information about its originating software, but must not contain qualitative information, such as '...is the best format for...'
For each submission, please include as much of the following information as possible. It’s okay if you don’t have everything, but please include what you can:
- Format name - Use the official name where known. Please capitalise each word unless the format name is stylised in some alternative way, e.g. Apple iBook.
- Version number (where relevant)
- PUID - if it exists already and you’re providing an enhanced description
- Extensions - any extensions known to be associated with the format
- MIME/Media Type - the MIME or Media Type associated with the format. This should be an official Media Type, either registered and listed via the IANA (https://www.iana.org/assignments/media-types/media-types.xhtml) or listed in official format documentation produced by the vendor
- Description - a concise, objective description of the file format.
- Format type - What type of format is it? (see below)
- Vendor (if known) - which vendor created the format? Which vendor currently supports it?
- File format identification signatures (for the brave!)
We like to credit all submissions on our Release Notes page (https://www.nationalarchives.gov.uk/aboutapps/pronom/release-notes.xml). You can be credited as an individual or we can credit your institution. We also keep track of international contributions via the contributors map (https://www.google.com/maps/d/u/0/viewer?mid=1zWzV6G-CZDzq_kvIFGFYTgYxATI) so please let us know how you’d prefer to be credited.
PRONOM data is published under the Open Government License 2.0 (http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/) so please ensure you are happy with the terms of this license before submitting any descriptive information.
All samples shared here are available under Creative Commons CC0 unless otherwise stated. Please ensure you have the right to share any samples you wish to submit, and that you are happy to share these under CC0 license (https://creativecommons.org/share-your-work/public-domain/cc0/).
You may prefer to submit your samples to the larger OPF Format Corpus (https://github.com/openpreserve/format-corpus).
Alternatively, if you need your samples to remain private or are unhappy with the licensing terms of this repository, you can submit them directly to the PRONOM mailbox: [email protected] - samples submitted via the mailbox will not be shared online and we can provide a formal NDA if required. We will use these solely for the purpose of file format research and signature validation.
Format descriptions must be objective - avoid using phrases like “This is the best format for…” and avoid comparisons with other formats.
The current list of format classifications within PRONOM are:
- Audio
- Database - the formats of database software, such as MS Access, MySQL
- GIS - Geographic Information System (geospatial data formats)
- Image (Raster) - images based on pixel grids, such as JPG, GIF, PNG
- Image (Vector) - images based on mathematical primitives, such as SVG, Adobe Illustrator, CorelDraw, WMF
- Page Description - the language of printers (https://en.wikipedia.org/wiki/Page_description_language). Examples include HP-GL, PDF, PostScript
- Presentation - such as Powerpoint, Impress, Apple Keynote
- Spreadsheet
- Text (Unstructured) - plain text formats with no formal structure
- Text (Structured) - plain text formats with defined, regular structure
- Text (Mark-up) - such as XML, SGML, MD
- Word Processor
- Video
- Aggregate - such as zip, WARC, 7z, rar, iso
- Dataset - structured forms of data
- Model - 3d formats such as CAD and 3d models
- Font
Your format may not easily fit into any of the above categories, so feel free to reach out for advice!
pronom-research-week is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, age, race, or religion. We do not tolerate harassment of participants in any form.
This code of conduct applies to all pronom-research-week spaces, including Google Docs, Google Groups, our GitHub repository, and e-mails, both online and off. Anyone who violates this code of conduct may be sanctioned or expelled from these spaces at the discretion of the RESPONSE TEAM (can be reached at [email protected]).
Some pronom-research-week spaces may have additional rules in place, which will be made clearly available to participants. Participants are responsible for knowing and abiding by these rules.
This anti-harassment policy text has been taken and modified from https://geekfeminism.wikia.org/wiki/Community_anti-harassment/Policy.