-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Update Evaluation Logic to Latest lm_eval
(0.4.8) and Support Automatic Benchmark Evals w/o Validation Set
#1348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -1307,6 +1445,14 @@ Text Generation arguments | |||
|
|||
|
|||
|
|||
- **eval_task_limit**: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only new argument in this PR. The updates elsewhere to this file are from running configs/gen_docs.py
.
lm_eval
(0.4,8)lm_eval
(0.4.8) and Support Automatic Benchmark Evals w/o Validation Set
@@ -27,7 +28,10 @@ | |||
import torch | |||
import torch.nn.functional as F | |||
|
|||
from lm_eval.models.utils import chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent versions of lm_eval
have changed the paths for many of these utility functions.
Local unit tests results. No unexpected failing tests.
|
All tests locally run with All tests run with
|
3be6bc9
to
2841bb0
Compare
I'm training a model where I want to train on the entire datasets. I do not want to split the dataset into train/val/test. I want to evaluate on a set of benchmarks, one of which was introduces in a later version of
lm_eval
. This PR adds support to evaluate against the configuredeval_tasks
during training even when we don't define a validation split, and to update to the latest version oflm_eval
.