Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predictions look random #8

Open
xiyuanzh opened this issue Jul 24, 2024 · 11 comments
Open

Predictions look random #8

xiyuanzh opened this issue Jul 24, 2024 · 11 comments

Comments

@xiyuanzh
Copy link

xiyuanzh commented Jul 24, 2024

Hi,

Thanks a lot for sharing this repo! I trained the model and evaluated on the balance-scale and mfeat-fourier datasets (the first two datasets evaluated in TabPFN). For both datasets, the model predicts all rows as one class. Moreover, the accuracy on the training set stays around 0.5. May I know if there are any configurations I need to take care of? This is how I train and test the model:

python -m tab_pfn.main train debug debug
python -m tab_pfn.main infer balance-scale debug/model_4095.pt debug_output --class-col target

Thanks!

@Ipsedo
Copy link
Owner

Ipsedo commented Jul 25, 2024

Maybe you don't have trained the model enough. As I remember there is a strange behavior : the model takes a lot of time to start convergence. Here you use the first save which leads to random predictions.

Try to evaluate with the saved model in resources folder, or train yours more time. The strange behavior is reflected in the metric (see it on mlflow) : after about thousands backward steps, the metric begin to decrease and the model has quite good results (not good as original paper but not so bad).

@xiyuanzh
Copy link
Author

Thank you so much for the prompt response! I will try training the model longer to see the performance.

I tried loading the model under the resource folder, but there was an error "_pickle.UnpicklingError: invalid load key, 'v'." when the following line of code in infer.py was executed: "tab_pfn.load_state_dict(th.load(infer_options.state_dict))".

I also made two changes to your code, which I want to check if my understanding is correct.

  1. The original __get_tgt_mask() function masks all values except the diagonal values, and the outputs from self.__trf_dec are NaNs. I change the __get_tgt_mask() function to mask upper diagonal values. More specifically,
def __get_tgt_mask(self, x_test: th.Tensor) -> th.Tensor:

    device = self.__get_device()
    sz = x_test.size(1)  
    mask = th.triu(th.ones(sz, sz, device=device) * float('-inf'), diagonal=1)
    mask = mask.repeat(x_test.size(0) * self.__nheads, 1, 1)  
    return mask
  1. The following two lines in scm.py shows syntax errors
x = outs_stacked[:, *self.__zx_nodes_idx].squeeze(-1)
y = outs_stacked[:, *self.__zy_node_idx].squeeze(-1)

I changed them to

x = outs_stacked[:, self.__zx_nodes_idx[0], self.__zx_nodes_idx[1]].squeeze(-1)
y = outs_stacked[:, self.__zy_node_idx[0], self.__zy_node_idx[1]].squeeze(-1)

Could you help check if my understanding is correct? Thanks a lot!

@Ipsedo
Copy link
Owner

Ipsedo commented Jul 25, 2024

For the target mask it needs to be a diagonal : there isn't relation between target observations so one target only see itself inside the transformer (as my understand of the paper).

For the starred expression within a slice, which python version do you use? It may be a new feature of recent python version.

For the state dict loading which fails, let me check that. Could you try it on develop-sam branch? I possibly have done modifications and I need to re-push the actualized model state dict. Also which PyTorch version do you use?

@xiyuanzh
Copy link
Author

xiyuanzh commented Jul 26, 2024

Thanks so much for the explanation! This is very helpful! I ran model_183295.pt on develop-sam branch across all the 30 test datasets in TabPFN and attached the results below. The mean accuracy is around 0.8. Are these numbers close to what you reproduced? Thanks!

0 balance-scale 0.891026
1 mfeat-fourier 0.795000
2 breast-w 0.974212
3 mfeat-karhunen 0.959000
4 mfeat-morphological 0.724000
5 mfeat-zernike 0.818000
6 cmc 0.523098
7 credit-approval 0.843478
8 credit-g 0.758000
9 diabetes 0.736979
10 tic-tac-toe 0.736952
11 vehicle 0.747045
12 eucalyptus 0.668478
13 analcatdata_authorship 0.976190
14 analcatdata_dmft 0.218593
15 pc4 0.901235
16 pc3 0.891165
17 kc2 0.819923
18 pc1 0.927798
19 banknote-authentication 0.957746
20 blood-transfusion-service-center 0.840607
21 ilpd 0.676976
22 qsar-biodeg 0.995627
23 wdbc 0.788770
24 cylinder-bands 0.722222
25 dresses-sales 0.596000
26 MiceProtein 0.994444
27 car 0.731959
28 steel-plates-fault 0.944444
29 climate-model-simulation-crashes 0.853009
mean 0.800399

@Ipsedo
Copy link
Owner

Ipsedo commented Jul 26, 2024

Yes it looks like exactly what I can reach about metrics.
If all is okay for your side you can close this issue. And if you like this implementation don't hesitate to star and share it :)

@xiyuanzh
Copy link
Author

Sure thanks! I also ran into this error: "RuntimeError: normal expects std >= 0.0, but found std -inf" for the following line of code "nn.init.normal_(module.weight, std=tnlu_float(1e-2, 10, 1e-8))" in scm.py after about 20K iterations. Is this expected? I change this line to "nn.init.normal_(module.weight, std=max(0, tnlu_float(1e-2, 10, 1e-8)))".

@Ipsedo
Copy link
Owner

Ipsedo commented Jul 28, 2024

No it's not the expected behaviour of TNLU, it may be a mistake from my side about its implementation. I will try to fix it by re-reading the paper (and also add unit tests on it!).
I will tell you when I successfully fixed it ;)

@Ipsedo
Copy link
Owner

Ipsedo commented Jul 29, 2024

I think I have fixed it by :

  • using right truncated normal formulas (from wikipedia)
  • use it in tnlu function

What I have seen during test execution is numerical precision issue (getting -10.0001 when the lower bound is at -10 for example), to avoid this I explicitly clamp truncated normal results with its bounds.

Can you test it on your side ?

@xiyuanzh
Copy link
Author

Thanks so much for the update! I tested it and found the loss quickly went to nan after ~60k iterations. Did you observe similar phenomenon on your side?

@Ipsedo
Copy link
Owner

Ipsedo commented Aug 1, 2024

Sorry for the delay to answer. I don't have seen this numerical issue.

Maybe my SCM implementation is not equal to what they done in the original paper : there are many subtleties that I have arbitrary resolved.
Or maybe the default hyper-parameters in the main script are not good causing this numerical issue. I will try to re-train it the next week and see if I also have NaN during the training.

Can you share me all the hyper-parameters you choose?

@xiyuanzh
Copy link
Author

xiyuanzh commented Aug 6, 2024

Hi thanks so much for your response! I tried multiple runs and all got NaNs for the loss. I used default hyper-parameters, i.e., "python -m tab_pfn.main --cuda train run_debug model_debug --batch-size 10". Please let me know if I need to provide any additional information, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants