-
Notifications
You must be signed in to change notification settings - Fork 682
[ENH] Implementing TimeXer
model from thuml
#1797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1797 +/- ##
=======================================
Coverage ? 86.95%
=======================================
Files ? 54
Lines ? 5958
Branches ? 0
=======================================
Hits ? 5181
Misses ? 777
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@fkiraly the implementation of |
FYI @agobbifbk, @benHeid |
A couple of comment:
|
Hi @agobbifbk thanks for the review!
Yes, I think I forgot to credit the original authors. Will do :).
I had similar questions with how this model would handle quantile predictions, the original architecture doesn't seem to be handling this, so I just decided to patch it up this line of code (which might be a bad approach). I am not very aware of what changes I should make to fix this? Could you help me here?
It is multi-output indeed, the _forecast_multi method is native to the |
Sure, usually it is sufficient to increase the number of output channels. Suppose the model ends with something in the shape My suggestion is to start from the implementation of the quantile loss and check the definition (in DSIPTS there is a multioutput version of it, just summing the contribution of each channel) and then play with the output shape of the model! Let me know if it is sufficient to finish the job :-) |
@agobbifbk , I've made changes to the output layer by adding the quantiles as a separate dimension in the |
Nice! enc_out = torch.reshape(
enc_out, (-1, n_vars, enc_out.shape[-2], enc_out.shape[-1])
) this code is hard to understand, I suggest to give some names to the variables (context_length?) so the next time anyone reads this will immediately understand the shapes of those tensors, what do you think? I still have a a doubt on the quantile loss, this is more a general PTF question. prediction = self.transform_output(
prediction=prediction, target_scale=x["target_scale"]
)
return self.to_network_output(prediction=prediction) Also in this case instead of using Wrapping up: clean a little bit the code, ensure the output is in a correct form, try to train the model and plot the outputs to see if it is giving correctly the confidence intervals. |
Earlier implementation erroneously included the target variables past data in the training data via exog_data, new implementation employs masking to ignore the target variable.
use tensor outputs for single target prediction instead of a list encapsulating targets for both single and multi-target prediction
all tests for the basic functionality have been implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Looks good to me!
I also assume this tells us a lot about the testing framework structure at v1.
(well, at least you now understand it, @PranavBhatP?)
Yes, most of the current testing framework makes sense now. There are a few parts which can be improved. I'll update them under the relevant issue/PR in a while, if that's fine. |
Absolutely! Feel free to branch off my PR or the current main. Just mention which PR you are branching off in the PR description. |
The PR is ready to merge. I've added the remaining changes to the docstrings.Failing test is unrelated to the code in the PR. |
Just one comment before closing: depending on the usage of the |
I have another comment here: self.ex_embedding = DataEmbedding_inverted(
self.hparams.context_length,
self.hparams.d_model,
self.hparams.embed_type,
self.hparams.freq,
self.hparams.dropout,
) Here we are assuming to pass the freq (that is not used in
I think that the frequency parameter should be removed and we need to take into account for categorical data (in your implementation it seems that the categorical variables are not used). |
Yes, I agree with your opinion on this, and I had thought about it before as well, but my main concern with implementing categorical variable support is whether the model will still perform as it is intended? I might be mistaken in my opinion, can you please suggest a possible approach for this? I will try to implement this from my end as well. This would be useful for future implementations, if it turns out well, since the data module for tslib supports categorical variables. |
Hi @agobbifbk
I have added comments to explain the shape of the final output tensors better. Do let me know, if more changes are required.
Since this change would probably affect all the future |
Ok it seems reasonable to close this and open a new one. You wrote: (batch_size, prediction_length, n_quantiles) It seems to me that there is a missing dimension (it should be 4, one for the number of output channels), isn't it? |
Maybe the language of the comment is confusing. |
Hi @fkiraly , this PR is ready to be merged. |
Description
This PR works on #1793 and aims to align and implement the
TimeXer
model within PTF's design.Checklist
pre-commit install
.To run hooks independent of commit, execute
pre-commit run --all-files
Make sure to have fun coding!