All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning.
- A
relax
option tofidibus
to proceed with prep task despite failed shasum verification. - The
*.protein2ilocus.repr.tsv
file containing mapping for iLocus representatives only. - Better exception handling/reporting for failed downloads.
- Reference genome configurations for Zootermopsis nevadensis (in support of the BWASP project) and Orchesella cincta (as additional proof-of-concept).
- Support for all Genbank genomes, not just those within RefSeq.
- Restored support for HymenopteraBase versions of several ant genomes.
- Ancillary files
.ilocus.mrnas.txt
and.protein2ilocus.txt
are not.tsv
files with headers. - Extensive documentation updates.
- Switched from nose to py.test as the testing framework.
- Updated checksums for many NCBI annotations to compensate for, among other things:
- changes in
##species
pragmas - transcript evidence descriptions and other metadata
- feature types for annotated mobile elements, antisense transcripts, origins of replication, and various other features
- an update to the C. elegans annotation (< 1% gene models affected)
- an update to the D. rerio annotation (≈ 3% CDS exons, ≈ 10% all exons affected)
- an update to the M. musculus annotation (1% CDS exons, 5% of all exons affected)
- changes in
- Deprecated
genhub-fix-trna.py
script.
- The
genhub-build.py
script is nowfidibus
, and the CLI was updated. - Pdom, Pcan, and Dqua now default to RefSeq genomes, with Toth Lab and CRG genomes available under labels Pdtl, Pccr, and Dqcr.
- The
cluster
build task now uses-M 0
by default. - The
Registry
class was simplified.
- The build script (now
fidibus
) now has support for custom genomes. - Sequence IDs are now reported for iLocus and miLocus tables.
- New script for creating a piLocus summary table.
- Recipes for chick pea, cabbage, and soybean.
- Recipes for another mosquito (Aedes aegypti) and a spider mite (non-insect arthropod).
- A
--shuffled
option to several scripts for reading from*iloci.shuffled.tsv
, and improved access from the GenomeDB class.
- Correctly included missing script in
setup.py
.
- New recipe for cotton.
- New recipe for Daphnia pulex.
- New script for creating an iLocus summary table.
- Updated test data files to compensate for corrections to LocusPocus' reporting of iLocus type.
- Added
seqfilter
to RefSeq module and .yaml configs.
- Used new
seqfilter
mechanism to eliminate redundant patch and variant data from human and mouse genomes. - Updated rice recipe following an update to the corresponding RefSeq entry.
- Filled out partial implementation of
--delta
option for the build script.
- Removed gene model with overlapping exons causing processing issues in C. reinhardtii.
- Removed unnecessary
Fragment
column from.iloci.tsv
table. Redundant withLocusClass=fiLocus
. - Removed outdated code for computing
LocusClass
. - Fixed feature for specifying iLocus label format.
- Integration with codecov.io.
- Lots of genome recipes
- Anopheles gambiae
- Homo sapiens
- Theobroma cacao
- some version-specific recipes
- TAIR6
- Apis mellifera assembly 2.0 / OGS 1.0
- Apis mellifera assembly 4.5 / OGS 3.2
- 9 species of green algae
- Implemented the
cleanup
andcluster
tasks for the main build script.
- Unit test fixtures to account for AEGeAn's improved reporting of iLocus types.
- Protein checksum for Xenopus tropicals, which was recently updated to drop the Silurana designation.
- versioneer issue with MANIFEST
- Recipe for the rice genome (Oryza sativa L. ssp. japonica).
- Recipe for a model legume genome (Medicago truncatula).
- Batch for all Hymenoptera.
- Multiprocessing support for build script.
- Complete overhaul of the genome configuration handling (now in the
registry
module). - Minor changes to the Travis CI configuration.
- Excluded Danio rerio config from CI tests, as its resource requirements are right at the limit of what the Travis VMs can handle.
- Updated Xenopus tropicalis config to drop the parentheses in the species name.
- Updated Drosophila melanogaster config to the latest RefSeq assembly/annotation.
- Moved sha1 and file resolution code from
__init__.py
toGenomeDB
class.
- Package metadata.
- Minor improvements to documentation.
- Added pre-requisites to setup.py.
- first stable release!
GenomeDB
class and various extensions for downloading and formatting data.- Modules for parsing and describing iLoci, proteins, mRNAs, exons, introns, and coding sequences.
- The script implementing the
stats
task, brought over with minimal changes from HymHub. - Unit tests, with 100% success rate and 100% coverage of core package code (not scripts yet).
- Minimal installation and usage documentation.