Add new functionality to calculate timestamps for dataframe splits based on user-defined percentages #63
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new function,
calculate_timestamps
, which allows splitting a dataframe into train, validation, and test sets based on percentage inputs. Additionally, the function supports optional preservation of continuous periods, determined by a time tolerance parameter.Key Features:
preserve_periods
option is enabled, the function ensures that the splits occur at the end of continuous periods, preventing data from being split in the middle of a period.time_tolerance
parameter, which defines the minimum gap between observations to be considered as the start of a new period.Unit Tests:
DatetimeIndex
.time_tolerance
cases to ensure gaps are correctly respected.