0.2.26 2024-08-15
Bumped version to 0.2.26 to catch up with data release. Only new client functionality is #81 'data_release' helper functions
All other changes in this release were for data (and contained in data_v0.2.26)
- #81 New 'data_release' code eg 'get_latest_combo_file_urls' that looks on GitHub to find latest data
- New GFFs: RefSeq RS_2023_10, Ensembl 111, 112
- #79 - RefSeq MT transcripts
- #66 - We now store 'Note' field (thanks holtgrewe for suggestion)
- Added requirements.txt for 'generate_transcript_data' sections
- client / JSON data schema version compatability check
- #56 - Fix occasional UTA duplicated exons
- #57 - Correctly handle retrieving genomic position and dealing w/indels in GFF (thanks ltnetcase for reporting)
- #60 - Fix for missing protein IDs due to Genbank / GenBank (thanks holtgrewe)
- #64 - Split code/data versions. json.gz are now labelled according to data schema version (thanks holtgrewe)
- Renamed 'CHM13v2.0' to 'T2T-CHM13v2.0' so it could work with biocommons bioutils
- #72 - Correctly handle ncRNA_gene genes (thanks holtgrewe for reporting)
- #73 - HGNC ID was missing for some chrMT genes in Ensembl
0.2.21 - 2023-08-14
- #45 - FastaSeqFetcher - fix alignment gaps properly
- #52 - Added transcripts from Ensembl 110 GRCh38 release
- #53 - UTA to cdot transcript start/end conversion issue
0.2.20 - 2023-07-10
- #50 - Biotype was missing in Ensembl transcripts
0.2.19 - 2023-07-06
- #49 - MT not converted to contigs correctly (GRCh37/Ensembl only) #49
- Removed accidental logging
0.2.18 - 2023-07-05
- #44 - Support for mouse transcripts (Mus Musculus GRCm38 and GRCm39)
- #47 - Implement HGVS DataProvider get_alignments_for_region
- #45 - FastaSeqFetcher - handle deletions correctly (had swapped HGVS cigar projections around)
- #46 - HGVS DataProvider get_tx_info should properly handle alt_ac and alt_aln_method
0.2.17 - 2023-05-08
- #42 - Ensembl T2T CHM13v2.0
- #43 - Contigs not converted to accession numbers properly (this was breaking local Biocommons HGVS conversion using 0.2.16 data)
0.2.16 - 2023-04-12
- Added historical release 110 (2022-04-12) for T2T CHM13v2.0
- Added latest GRCh38.p14 release (2023-03-21)
0.2.15 - 2023-04-03
- Support for T2T CHM13v2.0
0.2.14 - 2023-03-21
- #39 - Fasta file SeqFetcher implementation
- Add Ensembl 109 GTF
- #38 - Differing implementation of get_tx_for_region to hgvs one (reported by Manuel Holtgrewe)
- #35 - Tags (ie MANE Select / RefSeq select etc) should be genome build specific
- #34 - Stick to PyHGVS conventions, throw ValueError: transcript is required on missing transcript
0.2.13 - 2023-02-23
- Fix for #25 - Pyhgvs data conversion - non-coding transcripts have bad cds start/end conversion
- Fix for #32 - Signature of get_pyhgvs_data consistent for all return statements
0.2.12 - 2022-12-08
- #30 - We now store "tag" attributes (eg "MANE Select", "RefSeq Select")
- Switch to using Ensembl GFF3 (so we can get tags out)
- Add Ensembl 108 GFF3
- Fix for #25 - GeneInfo currently fails for some records
- Fix for #27 - Change URL for missing RefSeq GFFs
0.2.11 - 2022-09-27
- Now support all methods (get_gene_info, get_tx_for_gene, get_tx_for_region) for REST
- Add Ensembl 107 GTF
- Ensembl gene info was missing "description"
0.2.10 - 2022-09-19
- Implement get_gene_info - For local JSON data only
- Fixed issue #23 UTA transcripts for PyHGVS
0.2.9 - 2022-09-01
0.2.8 - 2022-08-29
- Implemented get_pro_ac_for_tx_ac (c_to_p can now generate p.HGVS)
- Implemented get_tx_for_region for local JSON data only
0.2.7 - 2022-05-19
- Add transcripts from latest RefSeq GRCh37 (105) and RefSeq GRCh38 (110)
- Fixed default arguments bug where PyHGVS only worked on SACGF fork
- gtf_to_json now goes straight to cdot format (without intermediary PyReference format)
- UTA is not included in generation scripts by default, to enable, set environment variable UTA_TRANSCRIPTS=True
- Handle mismatches in UTA CIGAR alignments (convert to match (no indels) as GFF format has no support for mismatch)
0.2.6 - 2022-05-19
- Fixed issue Ensembl contigs g_to_c - Ensembl JSON was using chrom names ie "17" instead of "NC_000017.11" for contig
0.2.5 - 2022-04-14
- PyHGVS conversion fix - non-coding cds_start/cds_end is set to start/end (not None)
0.2.4 - 2022-04-13
- Latest RefSeq (110) and Ensembl (106) transcripts
- Fixed bug where all UTA transcripts were '-' strand
- Add "other_chroms" to combined historical file
0.2.3 - 2022-03-29
- Fixed bug where HGNC not extracted properly from Ensembl GTFs
- Gene information is now included by default (only adds 5%)
- Clean artifacts from UTA data
- Support for SACGF PyHGVS fork (which adds alignment gap support)
0.2.2 - 2022-03-03
- Support for HTTPS (bought SSL certificate for REST server)
0.2.1 - 2022-03-03
- JSON format changed, separating common/build specific coordinates. This is so a transcript can contain data for multiple builds.
- Use ijson to reduce RAM usage - uses iterator vs loading all JSON into RAM
0.1.1 - 2022-01-19
- Initial commit