-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hyper-parameter tuning for VDPWI #121
Comments
The old implementation was about 0.5 points off for Pearson's r on the test set -- now it's closer to 2. The biggest changes from the old impl to now would be using torchtext and PyTorch 0.4. The model code itself hasn't changed. |
I ran 216 tests on parameters including: decay=[0.99, 0.95], lr=[5E-4, 1E-4], batch_size=[8, 16], momentum=[0, 0.15, 0.05], rnn_hidden_dim=[128, 256, 512], epochs=[10,15,20]. In all cases, I use RMSProp for optimization according to the paper. The best result on test set for Pearson's r is 0.8707, which is 0.0077 lower than the original result. It's achieved under this param setting: --decay 0.95 --lr 0.0005 --optimizer rmsprop --momentum 0 --epochs 15 --batch-size 8 --rnn-hidden-dim 256. Among all the "nearly best" results(like 0.8678, 0.8667), they share some same params: --lr 5e-4, --batch-size 8, --epochs 15. Besides, I also ran some tests with SGD and Adam. Their performance are 1~2 points lower than RMSProp. |
Good results! |
I've updated the readme and sent PR. @Victor0118 |
I re-ran the best parameter setting (Pearson's r 0.8707) 80 times with different random seeds. The 95% confidence interval is: [0.8625, 0.8644]. Among these 80 results, the highest Pearson'r value is 0.8710 by setting the random seed as 723. The parameter setting is: --classifier vdpwi --lr 0.0005 --optimizer rmsprop --epochs 15 --momentum 0 --batch-size 8 --rnn-hidden-dim 256
I also run other parameters 10 times with different random seeds in case missing some potential good settings. There are two parameter sets achieving r value higher than 0.87: a) 0.8705 with 95% confidence interval [0.8621, 0.8667]; b)0.8702 with 95% confidence interval [0.8588, 0.8682]. @lintool To sum up, our best result improves 2 points after parameter tuning, and also very close to the result of the original torch implementation. |
According to @daemon - the VDPWI works https://github.com/castorini/Castor/tree/master/vdpwi
But the effectiveness is still below STOA because the hyper-parameters haven't been tuned yet.
The text was updated successfully, but these errors were encountered: