Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk index fix #35

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Bulk index fix #35

wants to merge 2 commits into from

Conversation

alondmnt
Copy link
Contributor

@alondmnt alondmnt commented Nov 27, 2024

This PR addresses an issue with the diet logging events (grouped bulk file), that is missing an array_index.

The previous code tried to add index levels that exist in the main table but are missing from the bulk data, mostly because sometimes we included array_index, and sometimes we didn't (first encountered this in the sleep monitoring dataset).

Solutions:

  1. Added an arg to load_bulk_data extend_bulk_index that remains True by default for backward compatibility (a few example notebooks depend on it), but can be set to False to disable the addition of missing index levels such as array_index, in case we run into additional issues.

  2. Improved the code that adds missing index levels to join also on collection date. This fixes the diet logging scenario and as far as I can tell doesn't cause regressions.

Alternative solution: We could disable extend_bulk_index by default to grouped bulk data (need to use the dictionary to check what type of bulk file is being loaded).

In the future we will probably change this altogether, but for now this seems like the minimal fix that will make loaders great again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant