Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new functionality to calculate timestamps for dataframe splits based on user-defined percentages #63

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

WladRamos
Copy link

This PR introduces a new function, calculate_timestamps, which allows splitting a dataframe into train, validation, and test sets based on percentage inputs. Additionally, the function supports optional preservation of continuous periods, determined by a time tolerance parameter.

Key Features:

  • Percentage-based splits: The function takes percentages for train, validation, and test sets and returns the corresponding timestamps for splitting the data.
  • Period preservation: When the preserve_periods option is enabled, the function ensures that the splits occur at the end of continuous periods, preventing data from being split in the middle of a period.
  • Time tolerance: The function includes a time_tolerance parameter, which defines the minimum gap between observations to be considered as the start of a new period.

Unit Tests:

  • Added comprehensive unit tests to cover:
    • Period preservation behavior.
    • Validation of input percentages (ensuring they sum to 1).
    • Handling of dataframes without a DatetimeIndex.
    • Small time_tolerance cases to ensure gaps are correctly respected.

…splits based on percentages with optional period preservation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant