Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding loader for Hainsworth dataset #617

Closed
wants to merge 248 commits into from
Closed

Adding loader for Hainsworth dataset #617

wants to merge 248 commits into from

Conversation

tanmayy24
Copy link
Collaborator

@tanmayy24 tanmayy24 commented Jan 22, 2024

Please include the following information at the top level docstring for the dataset's module mydataset.py:

  • Describe annotations included in the dataset
  • Indicate the size of the datasets (e.g. number files and duration, hours)
  • Mention the origin of the dataset (e.g. creator, institution)
  • Describe the type of music included in the dataset
  • Indicate any relevant papers related to the dataset
  • Include a description about how the data can be accessed and the license it uses (if applicable)

Dataset loaders checklist:

  • Create a script in scripts/, e.g. make_my_dataset_index.py, which generates an index file.
  • Run the script on the canonical version of the dataset and save the index in mirdata/indexes/ e.g. my_dataset_index.json.
  • Create a module in mirdata, e.g. mirdata/my_dataset.py
  • Create tests for your loader in tests/datasets/, e.g. test_my_dataset.py
  • Add your module to docs/source/mirdata.rst and docs/source/table.rst
  • Run black, flake8 and mypy (see Running your tests locally).
  • Run tests/test_full_dataset.py on your dataset.
  • Check that codecov coverage does not decrease.

If your dataset is not fully downloadable there are two extra steps you should follow:

  • Contacting the mirdata organizers by opening an issue or PR so we can discuss how to proceed with the closed dataset.
  • Show that the version used to create the checksum is the "canonical" one, either by getting the version from the dataset creator, or by verifying equivalence with several other copies of the dataset.
  • Make sure someone has run pytest -s tests/test_full_dataset.py --local --dataset my_dataset once on your dataset locally and confirmed it passes.

Please-do-not-edit flag

To reduce friction, we will make commits on top of contributor's pull requests by default unless they use the please-do-not-edit flag. If you don't want this to happen don't forget to add the flag when you start your pull request.

rabitt and others added 30 commits July 12, 2019 13:12
* fix attribute bug, move audio loaders to Track

* remove check_validated

* update tests

* test ikala loading functionality

* rollback librosa

* code review

* bump version


Former-commit-id: 2bd45a7
* don't allow data_home to be none in any functions, data_home is now one level lower

* update beatles

* update some tests, most still failing

* fix tests for beatles - todo, fix download mocking

* update ikala data_home

* rollback create validated/invalid

* update medleydb_melody data_home

* update medleydb_pitch data_home

* update orchset data_home

* update salami data_home

* remove commented code

* uncomment failing tests

* update tests

* remove old version module, update version


Former-commit-id: 0c78960
Former-commit-id: 799e506
* draft for mirdata rtd. including example, faq(placeholder)

* add more comments per review


Former-commit-id: e9c777e
* orchset tests

* medleydb-melody tests

* add tests for medleydb-pitch

* add beatles tests

* add salami tests

* fixes #77

* increase version


Former-commit-id: 8881ee5
* update __repr__

* format very long string

* test all __repr__ methods

* bump version


Former-commit-id: 64a3fa6
* adding rwc collection

* add docs step

* add rwc datasets to docs

* add tests and bugfix for rwc classical

* handle incomplete first measure

* update contributing

* rwc genre tests

* rwc jazz tests

* rwc popular tests

* add __repr__ to contributing example

* bump version


Former-commit-id: 2bf915f
* fix when annotsations are missing and test

* increase test coverge

* update version


Former-commit-id: 5a59577
* initial downloader

* proposed downloader and how it would work in two loaders

* update download in beatles

* data_home -> save_dir

* three more examples

* move all downloading functions to new module

* move tests

* start updating loaders

* update some imports

* web_downloader -> download

* update tests

* download.py -> download_utils.py to avoid name collision

* bump version

* test for downloader (#105)


Former-commit-id: 6e07377
* beat_positions type from str to int

* rwc collection: track.track_duration_sec to track.duration_sec

* beatles beat_positions int

* fix missing annotations beat and key

* try to fix coverage

* consistency with salami loader


Former-commit-id: 7c32f59
* replace print with logging.info

* fix black error

* bump version


Former-commit-id: 63dc7cb
* Finished generating index.json Started Data Loader

* finished writing tests

* changed dependency in setup.py

* one more try at fixing dependency

* Gave up on soundfile.
Didn't realize librosa already had support for multichannel wave files

* post CR fix. CR by @rabitt.

* bump version


Former-commit-id: 10bb98a
* fix #110

* thor CR

* add docs link

* fixes #87


Former-commit-id: c49b3d1
* fix for guitarset.download()

* increase coverage

* refactored and increased coverage

* update RemoteFileMetadata

* update example loader

* run black on guitarset index

* bump version

* rm old docstring


Former-commit-id: 13c0bec
* Librosa now >= 0.7.0

* Add libsndfile as a dependency elsewhere

* bump version


Former-commit-id: 4022668
* make all subdirectories

* remove unused import

* bump version


Former-commit-id: dccf93f
* fix checksum in beatles and salami, simplify tar.gz downloader

* beat positions to int rwc

* fix download folder

* remove comments

* update version


Former-commit-id: caadd71
* multi channel support w/ GuitarSet

* bumped version to 0.0.17 and added `requests` to setup


Former-commit-id: b467bfb
* begin writing medley_solos_db.py

* write track_ids in medley_solos_db

* write load in medley_solos_db

* write cite in medley_solos_db

* add medley_solos_db module to __init__

* add song_id to track_metadata in medley-solos-db

* write make_medley_solos_db script

* skip header in medley-solos-DB csv

* bugfix make_medley_solos_db_index

* import hashlib in make_medley_solos_db_index

* update index

* import json in make_medley_solos_db_index

* define msdb index keys by uuid4

* upload msdb JSON index

* bugfix _track_metadata

* write validate in msdb

* write track_ids in msdb

* write _reload_metadata in msdb

* write _load_metadat in msdb

* finish writing _load_metadata in msdb

* bugfix msdb annotation checksum

* update urls and checksums in msdb

* pep8

* .wav.wav -> .wav

* update msdb index (audio/ subdir)

* update metadata_path in msdb

* bugfix _track_paths

* set sr to 22050 in msdb audio

* reformat using black

* upload metadata CSV in resources

* upload one track of MSDB for tests

* start MSDB test file

* test_cite in MSDB

* test_load in MSDB

* start test_track in MSDB

* add MSDB to docs

* import DEFAULT_DATA_HOME in msdb test

* typo in MSDB test_track

* bugfix test_load MSDB

* test_track_ids in MSDB

* bugfix test_cite in MSDB

* annotations -> annotation folder in MSDB test

* finish test_track in msdb


Former-commit-id: 5d17ed4
Former-commit-id: d30c342
* remove validate from inside load

* bump version


Former-commit-id: 5667f47
Former-commit-id: 8a31ce0
* inidial idea for fixing #100

* moved LargeData to utils, applied to all loaders

* black

* fix documentation, fixes #139

* bump version


Former-commit-id: 778c405
* fix download error

* reformat

* add tests for all downloaders

* run black

* bump version


Former-commit-id: 39002ce
* adding DALI dataset

* updating dali to current master version

* formatting

* remove maps

* fix metadata, format and loaders

* add tests, waiting for small file for final testing

* reduced metadata file dali

* dali tests

* dali test resources

* fix

* fix rep str

* dic metadata to attributes

* fix rounding issue p27

* update setup

* add one more test


Former-commit-id: e340d67
* first draft of to_jams

* add jams_utils and move to_jams to Track()

* add to_jams() and change to hierarhcy-corrected labels

* change sections and chords to mir_eval format, add more to_jams(), track duration to float

* update tests

* update files tests

* metadata to dict, add data type checks, f0s_to_jams

* change metadata in to_jams

* add to_jams

* add to jams guitarset

* add lyrics to_jams

* tests for chords and beats

* tests sections and multi_sections

* tests keys and lyrics

* metadata and data type checking

* tests metadata and data type checking

* typo

* add to_jams

* simplfy metadata in to_jams function

* formatting

* chords format, metadata keys, a bit of file formatting

* update jazz test

* update guitarset

* start test dali draft

* Contributing, API and docstings

* black

* remove test_dali

* import in contributing

* add to_jams to medley_solos_db

* black

* remove maps from docs

* comment audio checking until solving backend

* some tests to_jams

* remove unused code

* update metadata test resources

* utf-8

* fix contributing

* tests for to_jams all datasets


Former-commit-id: ed5756f
magdalenafuentes and others added 19 commits November 25, 2022 10:54
Former-commit-id: c9fd249
* Fix tox for formatting test

* Pin black version to 23.1.0

* Upgrade librosa version and ensure python3.6 compatibility

* Black formatting with new 23.1.0 version

* Fixing egfxset expected return value

* Mock pandas import at sphinx autodoc

* Fix black version for python3.6

Former-commit-id: 0620b8c
* Add BAF loader

* Extend docstring and improve test coverage

* Better check if file does not exist and improve test coverage

* formatting...

* Remove unused imports

Former-commit-id: 8c84e50
* Create script and index

* Create loader and fix index

* Create tests and undo index fix

* Add test resources and fix index again

* Fix test resources

* Add loaders and tests for taala and tonic

* Loader finished

* Update loader with new dataset

* fix loader with new updates

* add testing files and structre function

* core fixes to get the tests passing

* fix carnatic varnam with new dataset updates

* black formatting

* remove unused function

* new version 1.1 [wip]

* remove prints, loader good

* add load notation as exception

* index updated with new version 1.1

* update setup

* fix problem in _metadata

* merging...

* adding testing file and smart open

* shorten test file name

* add Exception is load_notation

* fix problem with exception

* add test coverage

* update remotes, improve docs

* fix remotes

* fix data folder naming in dataset

Former-commit-id: 496eb4a
* ADD formatting workflow

* FIX variables in formatting workflow

* ADD python linting workflow and environment

* ADD CI workflow and environment

* Remove CircleCI

* UPDATE new_loader.md PR template

* ADD readthedocs

* UPDATE readme badges

* FIX numpy asarray bug

* CHANGE arg name due to librosa update

* REMOVE tox.ini

* UPDATE dependencies for test

* ADD dependencies

* dependencies..

* dependencies..

* dependencies..

* dependencies..

* fix dependencies versions

* ADD all smart_open protocols install for CI

* MOVE smart_open[all] to pip install

* INSTALL types to pass mypy

* intall types-pyaml for python linting test

* Change to work with music21 v9.*

* TEST Environment CI with no reestrictions. python3.10 test passing in local

* FIX ikala test to pass linux tests

* FIX normpath for windows tests

* BLACK

* Change assert tolerance for floats in test_ikala

The motivation of this change is that linux and macos return different
floats. MacOS returns 260.946404518887 while Linux 260.94640451888694.
So we adjust the tolerance of the test

* Remove windows CI test

* Assert modification forgot in the last commit

* Specifying packages versinos on environment yml

* Fix h5py version for python3.7

* CI test dependencies fixed at the versions of last passing test

* CI test dependencies without lowerbound

* CI test dependencies that should work

* sort dependencies by alphabetical order

* Update setup dependencies

* Update test-lint dependencies

* Update contributing docs to match new testing pipeline

* Remove comment from test_ikala

This comment was showing the assertion test done before the PR#596

* Set dependencies packages versions for docs

* Remove comment

* jams get_duration handling

* Trigger tests after CircleCI removing

---------

Co-authored-by: Magdalena Fuentes <[email protected]>
Former-commit-id: dfaadca
* Update badges url

* Trigger doc build again

Former-commit-id: e3c3253
Co-authored-by: Guillem Cortès <[email protected]>
Former-commit-id: cbd3dc4
A previous link for "OMRAS2 Metadata Project 2009" gives "404 Not found error", I replaced it with the one from the CiteSeerX.

Co-authored-by: Harsh Palan <[email protected]>
Co-authored-by: Genís Plaja-Roglans <[email protected]>
Co-authored-by: Guillem Cortès <[email protected]>
Former-commit-id: e3c34a5
* minor fixes (fix pip install syntax to install optional dependencies and remove --black from pytest)

* fix: typos

---------

Co-authored-by: Guillem Cortès <[email protected]>
Former-commit-id: 4bc5b18
* add new version of dataset

* black

* fix in tox

* fixing librosa (@dagett)

* black formatting

* remove old index, add note in docs

* fix formatting, add tests for makam

* formatting four way tabla dataset

* add normpath to tests and script

* formatting

---------

Co-authored-by: Harsh Palan <[email protected]>
Co-authored-by: Guillem Cortès <[email protected]>
Former-commit-id: ef72b2c
* idmt_smt_audio_effects dataset script and index

* idmt_smt_audio_effects loader

* idmt_smt_audio_effects tests and resources

* idmt_smt_audio_effects dataset added to docs

* black formatter

* fixing type error in mypy test

* loader docstring

* pytest fix: folder delete in dataloader

* remove download func, adding unpacking dirs

* fixed resources path

* formatting

* formatting

* added tests

* added tests

* fixing test

* fix dependencies in setup.py

* modified dataset tests and added custom track to test_loaders.py

* changed docstrings and exception handling

* docstrings

* GitHub Actions migration (#596)

* ADD formatting workflow

* FIX variables in formatting workflow

* ADD python linting workflow and environment

* ADD CI workflow and environment

* Remove CircleCI

* UPDATE new_loader.md PR template

* ADD readthedocs

* UPDATE readme badges

* FIX numpy asarray bug

* CHANGE arg name due to librosa update

* REMOVE tox.ini

* UPDATE dependencies for test

* ADD dependencies

* dependencies..

* dependencies..

* dependencies..

* dependencies..

* fix dependencies versions

* ADD all smart_open protocols install for CI

* MOVE smart_open[all] to pip install

* INSTALL types to pass mypy

* intall types-pyaml for python linting test

* Change to work with music21 v9.*

* TEST Environment CI with no reestrictions. python3.10 test passing in local

* FIX ikala test to pass linux tests

* FIX normpath for windows tests

* BLACK

* Change assert tolerance for floats in test_ikala

The motivation of this change is that linux and macos return different
floats. MacOS returns 260.946404518887 while Linux 260.94640451888694.
So we adjust the tolerance of the test

* Remove windows CI test

* Assert modification forgot in the last commit

* Specifying packages versinos on environment yml

* Fix h5py version for python3.7

* CI test dependencies fixed at the versions of last passing test

* CI test dependencies without lowerbound

* CI test dependencies that should work

* sort dependencies by alphabetical order

* Update setup dependencies

* Update test-lint dependencies

* Update contributing docs to match new testing pipeline

* Remove comment from test_ikala

This comment was showing the assertion test done before the PR#596

* Set dependencies packages versions for docs

* Remove comment

* jams get_duration handling

* Trigger tests after CircleCI removing

---------

Co-authored-by: Magdalena Fuentes <[email protected]>

* Update badges url (#598)

* Update badges url

* Trigger doc build again

* metadata exception, whitespaces in table.rst

* fixing table.rst

* fixing mirdata.rst and adding references to quick_reference.rst

* increasing test coverage

* adding corrupted xml file for testing

* modified metadata logic for xml files

* removed general exception

* removing FileNotFoundError, changing dirs for _ and moving Cached Properties to Attributes

* revert to FileNotFoundError and test

---------

Co-authored-by: Magdalena Fuentes <[email protected]>
Co-authored-by: Genís Plaja-Roglans <[email protected]>
Co-authored-by: Guillem Cortès <[email protected]>
Former-commit-id: afbc0c3
* scripts/make, index and track and dataset class. TODO tests

* fix docstring

* modify the docs

* download disclaimer

* black

* first test

* fix metadata

* remove embeddings

* add more tests

* black

* modify tests

* modify fix.py for adding music21 (optional)

* fix bugt with load_scores

* fix bugs

* from smart_open import open

* from smart_open import open

* same error than francesco

* test fulldataset

* test fulldataset

* test fulldataset

* genis suggestion

* replace os.path.exists by try catch

* fix plobles with try catch

* add cipi to CUSTOM_TEST_TRACKS

* modify all the tests

* black

* smart open test

* black

* check embeddings

* check embeddings

* check embeddings

* imrpoving codecov

* rollback haydn_op20.py

* rollback haydn_op20.py

* comentario de los embeddings

* cante100 -> cipi

* baclk

* expressiveness

* fix make

* Done!

* Update cipi.py

* difficulty annotation

* fix docs table

* add dataset details and fix error message

* now doing the fixes right :)

* address problem in table.rst

---------

Co-authored-by: PRamoneda <[email protected]>
Co-authored-by: Genís Plaja-Roglans <[email protected]>
Co-authored-by: Guillem Cortès <[email protected]>
Co-authored-by: genisplaja <[email protected]>
Former-commit-id: c1dccb0
* first commit

* update on dependencies

* missing track in test_loaders

* fix testing hindustani track

* check mypy problem

* add more tests

* create entry for annotation type activation in quick ref

* update table, quick ref, and docstring, add test

* remove deprecated functions of dataset

* black

---------

Co-authored-by: Magdalena Fuentes <[email protected]>
Co-authored-by: Guillem Cortès <[email protected]>
Former-commit-id: a944318
* added required files for candombe_beat_downbeat dataset

* fixed empty new line at the end of file

* changed dataloader name from candombe_beat_downbeat to candombe

* added documentation for load_beats function in candombe dataloader

* reformatted with black

* included candombe dataset information in table.rst and mirdata.rst

* formatted candombe.py with black

* changed candombe.py according to black formatting

* updated candombe Track class docstring

* made slight docstring changes

* slight docstring changes

* Fixed black issues

---------

Co-authored-by: Jimena Arruti <[email protected]>
Co-authored-by: Jimena Arruti <[email protected]>
Co-authored-by: Harsh Palan <[email protected]>
Co-authored-by: Guillem Cortès <[email protected]>
Co-authored-by: Genís Plaja-Roglans <[email protected]>
Co-authored-by: Tanmay Khandelwal <[email protected]>
Former-commit-id: 7fceb2f
Former-commit-id: ff99c5d
* Badge fixes

* Badge zenodo addition

* Badge zenodo addition

* Addition of zenodo in docs

* Fixed doi issue

* Fixed doi issue

* Added main branch run and pointed badges to right main

* Edited branch name

---------

Co-authored-by: Tanmay Khandelwal <[email protected]>
Former-commit-id: a72fc2c
…611)

* custom download function, tests and test dataset

* remove redundant data_home check

* added to download exceptions

* formatting

* download info

* Update tree in idmt_smt_audio_effects.py

Co-authored-by: Guillem Cortès <[email protected]>

* corrected moving folders, directory tree

* added comments to download info

* typo

---------

Co-authored-by: Guillem Cortès <[email protected]>
Former-commit-id: 459833a
@tanmayy24 tanmayy24 changed the title [WIP] Adding loader for Hainsworth Dataset [WIP] Adding loader for Hainsworth dataset Jan 22, 2024
Copy link

codecov bot commented Jan 22, 2024

Codecov Report

Attention: Patch coverage is 98.03922% with 1 line in your changes missing coverage. Please review.

Project coverage is 97.07%. Comparing base (c1e3cf9) to head (d6e709a).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #617   +/-   ##
=======================================
  Coverage   97.07%   97.07%           
=======================================
  Files          63       64    +1     
  Lines        7341     7392   +51     
=======================================
+ Hits         7126     7176   +50     
- Misses        215      216    +1     

@tanmayy24 tanmayy24 changed the title [WIP] Adding loader for Hainsworth dataset Adding loader for Hainsworth dataset Jan 25, 2024
tanmayy24 and others added 4 commits February 1, 2024 15:21
* Addition of ballroom

* description text update

* lint error fix

* fixed path

* Updated download info

* Update in sphinx

* Update in sphinx

* Update in sphinx

* added references

* Sphinx version update

* update in doc for ballroom

* Fixed removal

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* addition of dataset link

* Reverted back the version in requirements.txt

* change in ballroom status

* Update in ballroom remote

* black formatting fixes

* black formatting fixes

---------

Co-authored-by: Tanmay Khandelwal <[email protected]>
Former-commit-id: 2c4ee52
* Addition of ballroom

* description text update

* lint error fix

* fixed path

* Updated download info

* Update in sphinx

* Update in sphinx

* Update in sphinx

* added references

* Sphinx version update

* update in doc for ballroom

* Fixed removal

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* Fixes in sphinx format

* addition of dataset link

* Reverted back the version in requirements.txt

* change in ballroom status

* Update in ballroom remote

* black formatting fixes

* black formatting fixes

* fixes in black version

* fixes in black version

---------

Co-authored-by: Tanmay Khandelwal <[email protected]>
Former-commit-id: c1e3cf9
Former-commit-id: 87dffcb
@tanmayy24 tanmayy24 closed this by deleting the head repository Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.