Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses an issue with the diet logging events (grouped bulk file), that is missing an array_index.
The previous code tried to add index levels that exist in the main table but are missing from the bulk data, mostly because sometimes we included array_index, and sometimes we didn't (first encountered this in the sleep monitoring dataset).
Solutions:
Added an arg to load_bulk_data
extend_bulk_index
that remains True by default for backward compatibility (a few example notebooks depend on it), but can be set to False to disable the addition of missing index levels such as array_index, in case we run into additional issues.Improved the code that adds missing index levels to join also on collection date. This fixes the diet logging scenario and as far as I can tell doesn't cause regressions.
Alternative solution: We could disable
extend_bulk_index
by default to grouped bulk data (need to use the dictionary to check what type of bulk file is being loaded).In the future we will probably change this altogether, but for now this seems like the minimal fix that will make loaders great again.