Predictions look random #8

xiyuanzh · 2024-07-24T21:25:12Z

Hi,

Thanks a lot for sharing this repo! I trained the model and evaluated on the balance-scale and mfeat-fourier datasets (the first two datasets evaluated in TabPFN). For both datasets, the model predicts all rows as one class. Moreover, the accuracy on the training set stays around 0.5. May I know if there are any configurations I need to take care of? This is how I train and test the model:

python -m tab_pfn.main train debug debug
python -m tab_pfn.main infer balance-scale debug/model_4095.pt debug_output --class-col target

Thanks!

Ipsedo · 2024-07-25T06:07:19Z

Maybe you don't have trained the model enough. As I remember there is a strange behavior : the model takes a lot of time to start convergence. Here you use the first save which leads to random predictions.

Try to evaluate with the saved model in resources folder, or train yours more time. The strange behavior is reflected in the metric (see it on mlflow) : after about thousands backward steps, the metric begin to decrease and the model has quite good results (not good as original paper but not so bad).

xiyuanzh · 2024-07-25T06:52:34Z

Thank you so much for the prompt response! I will try training the model longer to see the performance.

I tried loading the model under the resource folder, but there was an error "_pickle.UnpicklingError: invalid load key, 'v'." when the following line of code in infer.py was executed: "tab_pfn.load_state_dict(th.load(infer_options.state_dict))".

I also made two changes to your code, which I want to check if my understanding is correct.

The original __get_tgt_mask() function masks all values except the diagonal values, and the outputs from self.__trf_dec are NaNs. I change the __get_tgt_mask() function to mask upper diagonal values. More specifically,

def __get_tgt_mask(self, x_test: th.Tensor) -> th.Tensor:

    device = self.__get_device()
    sz = x_test.size(1)  
    mask = th.triu(th.ones(sz, sz, device=device) * float('-inf'), diagonal=1)
    mask = mask.repeat(x_test.size(0) * self.__nheads, 1, 1)  
    return mask

The following two lines in scm.py shows syntax errors

x = outs_stacked[:, *self.__zx_nodes_idx].squeeze(-1)
y = outs_stacked[:, *self.__zy_node_idx].squeeze(-1)

I changed them to

x = outs_stacked[:, self.__zx_nodes_idx[0], self.__zx_nodes_idx[1]].squeeze(-1)
y = outs_stacked[:, self.__zy_node_idx[0], self.__zy_node_idx[1]].squeeze(-1)

Could you help check if my understanding is correct? Thanks a lot!

Ipsedo · 2024-07-25T08:52:40Z

For the target mask it needs to be a diagonal : there isn't relation between target observations so one target only see itself inside the transformer (as my understand of the paper).

For the starred expression within a slice, which python version do you use? It may be a new feature of recent python version.

For the state dict loading which fails, let me check that. Could you try it on develop-sam branch? I possibly have done modifications and I need to re-push the actualized model state dict. Also which PyTorch version do you use?

xiyuanzh · 2024-07-26T05:38:03Z

Thanks so much for the explanation! This is very helpful! I ran model_183295.pt on develop-sam branch across all the 30 test datasets in TabPFN and attached the results below. The mean accuracy is around 0.8. Are these numbers close to what you reproduced? Thanks!

0 balance-scale 0.891026
1 mfeat-fourier 0.795000
2 breast-w 0.974212
3 mfeat-karhunen 0.959000
4 mfeat-morphological 0.724000
5 mfeat-zernike 0.818000
6 cmc 0.523098
7 credit-approval 0.843478
8 credit-g 0.758000
9 diabetes 0.736979
10 tic-tac-toe 0.736952
11 vehicle 0.747045
12 eucalyptus 0.668478
13 analcatdata_authorship 0.976190
14 analcatdata_dmft 0.218593
15 pc4 0.901235
16 pc3 0.891165
17 kc2 0.819923
18 pc1 0.927798
19 banknote-authentication 0.957746
20 blood-transfusion-service-center 0.840607
21 ilpd 0.676976
22 qsar-biodeg 0.995627
23 wdbc 0.788770
24 cylinder-bands 0.722222
25 dresses-sales 0.596000
26 MiceProtein 0.994444
27 car 0.731959
28 steel-plates-fault 0.944444
29 climate-model-simulation-crashes 0.853009
mean 0.800399

Ipsedo · 2024-07-26T13:50:41Z

Yes it looks like exactly what I can reach about metrics.
If all is okay for your side you can close this issue. And if you like this implementation don't hesitate to star and share it :)

xiyuanzh · 2024-07-26T18:54:32Z

Sure thanks! I also ran into this error: "RuntimeError: normal expects std >= 0.0, but found std -inf" for the following line of code "nn.init.normal_(module.weight, std=tnlu_float(1e-2, 10, 1e-8))" in scm.py after about 20K iterations. Is this expected? I change this line to "nn.init.normal_(module.weight, std=max(0, tnlu_float(1e-2, 10, 1e-8)))".

Ipsedo · 2024-07-28T10:55:21Z

No it's not the expected behaviour of TNLU, it may be a mistake from my side about its implementation. I will try to fix it by re-reading the paper (and also add unit tests on it!).
I will tell you when I successfully fixed it ;)

Ipsedo · 2024-07-29T16:19:35Z

I think I have fixed it by :

using right truncated normal formulas (from wikipedia)
use it in tnlu function

What I have seen during test execution is numerical precision issue (getting -10.0001 when the lower bound is at -10 for example), to avoid this I explicitly clamp truncated normal results with its bounds.

Can you test it on your side ?

xiyuanzh · 2024-07-30T16:34:57Z

Thanks so much for the update! I tested it and found the loss quickly went to nan after ~60k iterations. Did you observe similar phenomenon on your side?

Ipsedo · 2024-08-01T15:26:24Z

Sorry for the delay to answer. I don't have seen this numerical issue.

Maybe my SCM implementation is not equal to what they done in the original paper : there are many subtleties that I have arbitrary resolved.
Or maybe the default hyper-parameters in the main script are not good causing this numerical issue. I will try to re-train it the next week and see if I also have NaN during the training.

Can you share me all the hyper-parameters you choose?

xiyuanzh · 2024-08-06T00:10:22Z

Hi thanks so much for your response! I tried multiple runs and all got NaNs for the loss. I used default hyper-parameters, i.e., "python -m tab_pfn.main --cuda train run_debug model_debug --batch-size 10". Please let me know if I need to provide any additional information, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictions look random #8

Predictions look random #8

xiyuanzh commented Jul 24, 2024 •

edited

Loading

Ipsedo commented Jul 25, 2024 •

edited

Loading

xiyuanzh commented Jul 25, 2024

Ipsedo commented Jul 25, 2024

xiyuanzh commented Jul 26, 2024 •

edited

Loading

Ipsedo commented Jul 26, 2024

xiyuanzh commented Jul 26, 2024

Ipsedo commented Jul 28, 2024

Ipsedo commented Jul 29, 2024

xiyuanzh commented Jul 30, 2024

Ipsedo commented Aug 1, 2024 •

edited

Loading

xiyuanzh commented Aug 6, 2024

Predictions look random #8

Predictions look random #8

Comments

xiyuanzh commented Jul 24, 2024 • edited Loading

Ipsedo commented Jul 25, 2024 • edited Loading

xiyuanzh commented Jul 25, 2024

Ipsedo commented Jul 25, 2024

xiyuanzh commented Jul 26, 2024 • edited Loading

Ipsedo commented Jul 26, 2024

xiyuanzh commented Jul 26, 2024

Ipsedo commented Jul 28, 2024

Ipsedo commented Jul 29, 2024

xiyuanzh commented Jul 30, 2024

Ipsedo commented Aug 1, 2024 • edited Loading

xiyuanzh commented Aug 6, 2024

xiyuanzh commented Jul 24, 2024 •

edited

Loading

Ipsedo commented Jul 25, 2024 •

edited

Loading

xiyuanzh commented Jul 26, 2024 •

edited

Loading

Ipsedo commented Aug 1, 2024 •

edited

Loading