Skip to content

[ENH] Implementing D2 data module layer for tslib models. #1836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 79 commits into
base: main
Choose a base branch
from

Conversation

PranavBhatP
Copy link
Contributor

@PranavBhatP PranavBhatP commented May 15, 2025

Description

This PR fixes issue #1833, implements a D2 for tslib.

This PR involves the following changes:

  • New dataset and datamodule for tslib - link
  • Implementation of TimeXer with new data module. - link
  • Base class implementation of a TslibBaseModel - link.
  • Tests for the tslib data module. - link
  • Example notebook - link
  • Restructure codebase to include a layers directory for module containing architectural deep learning layer classes.

Checklist

  • Linked issues (if existing)
  • Amended changelog for large changes (and added myself there as contributor)
  • Added/modified tests
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
    To run hooks independent of commit, execute pre-commit run --all-files

Make sure to have fun coding!

@PranavBhatP PranavBhatP requested a review from yarnabrina as a code owner May 23, 2025 06:13
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Minor structure request: can you kindly make the attention etc modules in layers actual folders, in those private submodules with a single class each?

this completes the d2 pipeline for timexer, prediction has not been tested, also revert the loss dtype to nn.Module for now.
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@PranavBhatP
Copy link
Contributor Author

PranavBhatP commented May 25, 2025

FYI @agobbifbk @fkiraly @phoeenniixx. Rough Pipeline for tslib is complete. I have implemented TimeXer on this pipeline. Tests are pending. There are still some bugs, but training and prediction are working.

@fkiraly fkiraly moved this from PR in progress to PR under review in May - Sep 2025 mentee projects May 26, 2025
this change introduces sub-directories to provide a single module for a group of layers rather than dumping them in a single file with the same as the sub-directory
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

  • there is an empty file modules.py, can you remove this from the PR?
  • can you ensure the new base class has the "experimental" warning for now, similar to the v2 base model
  • could you put the base class in another private file, in base?

@PranavBhatP PranavBhatP requested a review from fkiraly May 27, 2025 16:30
@fkiraly fkiraly moved this from PR under review to PR in progress in May - Sep 2025 mentee projects May 28, 2025
@PranavBhatP
Copy link
Contributor Author

Hi @fkiraly @phoeenniixx

I went through the PR #1841, and considering that @phoeenniixx is implementing a v2 version of the test_all_estimators.py, this surely means that I will depend on these changes to implement a newer version test_timexer.py for this PR (currently the v1 version is in #1797)

Is it advisable to stack upon PR #1841? Or should I wait for a merge and work only on the independent tests for the tslib datamodule?

@fkiraly
Copy link
Collaborator

fkiraly commented May 29, 2025

(code formatting is failing)

@PranavBhatP
Copy link
Contributor Author

PranavBhatP commented May 30, 2025

Did code quality tests change @fkiraly? The pre commit tests are passing on my local repo, but here it is failing.

EDIT: Resolved.

@PranavBhatP
Copy link
Contributor Author

Hi @agobbifbk , continuing the discussion from the meet about the windowing, this is the code I was talking about
https://github.com/thuml/Time-Series-Library/blob/main/data_provider/data_loader.py

Specifically the implementation in the class Dataset_ETT_hour and Dataset_M4. Here are they are using fixed window sampling for the first one (Dataset_ETT_hour) and random window sampling ( Dataset_M4) for the second, this seems to be a difference but I might be mistaken. The reason they might be using different windowing is because of the forecasting task for which the dataset is used. (long-term or short-tem forecasting etc).

After going through the code deeper, I understand what you were saying, I don't think it matters what windowing we use as long as it matches with model input and makes it convenient to keep single one in the d2. But it would be nice if you could review the current implementation - link and some comments on whether we need separate windowing strategy on top of this.

Also FYI @phoeenniixx , since you were also involved in the discussion.

…pes between end_time and cutoff_time in _create_windows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: PR in progress
Development

Successfully merging this pull request may close these issues.

5 participants