-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predictions look random #8
Comments
Maybe you don't have trained the model enough. As I remember there is a strange behavior : the model takes a lot of time to start convergence. Here you use the first save which leads to random predictions. Try to evaluate with the saved model in resources folder, or train yours more time. The strange behavior is reflected in the metric (see it on mlflow) : after about thousands backward steps, the metric begin to decrease and the model has quite good results (not good as original paper but not so bad). |
Thank you so much for the prompt response! I will try training the model longer to see the performance. I tried loading the model under the resource folder, but there was an error "_pickle.UnpicklingError: invalid load key, 'v'." when the following line of code in infer.py was executed: "tab_pfn.load_state_dict(th.load(infer_options.state_dict))". I also made two changes to your code, which I want to check if my understanding is correct.
I changed them to
Could you help check if my understanding is correct? Thanks a lot! |
For the target mask it needs to be a diagonal : there isn't relation between target observations so one target only see itself inside the transformer (as my understand of the paper). For the starred expression within a slice, which python version do you use? It may be a new feature of recent python version. For the state dict loading which fails, let me check that. Could you try it on develop-sam branch? I possibly have done modifications and I need to re-push the actualized model state dict. Also which PyTorch version do you use? |
Thanks so much for the explanation! This is very helpful! I ran model_183295.pt on develop-sam branch across all the 30 test datasets in TabPFN and attached the results below. The mean accuracy is around 0.8. Are these numbers close to what you reproduced? Thanks! 0 balance-scale 0.891026 |
Yes it looks like exactly what I can reach about metrics. |
Sure thanks! I also ran into this error: "RuntimeError: normal expects std >= 0.0, but found std -inf" for the following line of code "nn.init.normal_(module.weight, std=tnlu_float(1e-2, 10, 1e-8))" in scm.py after about 20K iterations. Is this expected? I change this line to "nn.init.normal_(module.weight, std=max(0, tnlu_float(1e-2, 10, 1e-8)))". |
No it's not the expected behaviour of TNLU, it may be a mistake from my side about its implementation. I will try to fix it by re-reading the paper (and also add unit tests on it!). |
I think I have fixed it by :
What I have seen during test execution is numerical precision issue (getting -10.0001 when the lower bound is at -10 for example), to avoid this I explicitly clamp truncated normal results with its bounds. Can you test it on your side ? |
Thanks so much for the update! I tested it and found the loss quickly went to nan after ~60k iterations. Did you observe similar phenomenon on your side? |
Sorry for the delay to answer. I don't have seen this numerical issue. Maybe my SCM implementation is not equal to what they done in the original paper : there are many subtleties that I have arbitrary resolved. Can you share me all the hyper-parameters you choose? |
Hi thanks so much for your response! I tried multiple runs and all got NaNs for the loss. I used default hyper-parameters, i.e., "python -m tab_pfn.main --cuda train run_debug model_debug --batch-size 10". Please let me know if I need to provide any additional information, thanks! |
Hi,
Thanks a lot for sharing this repo! I trained the model and evaluated on the balance-scale and mfeat-fourier datasets (the first two datasets evaluated in TabPFN). For both datasets, the model predicts all rows as one class. Moreover, the accuracy on the training set stays around 0.5. May I know if there are any configurations I need to take care of? This is how I train and test the model:
python -m tab_pfn.main train debug debug
python -m tab_pfn.main infer balance-scale debug/model_4095.pt debug_output --class-col target
Thanks!
The text was updated successfully, but these errors were encountered: