Skip to content

Commit 46b86fd

Browse files
committed
ragas: bump ragas version, pass old rubric in RubricScore
Before ragas v0.2.11 RubricScores.rubrics wasn't being applied properly. This commit sets that as the minimum version for this library. A change in v0.2.11 from previous versions was a change in the prompt for domain specific knowledge evaluation with reference. The prompt from previous versions is now explicitly passed in. Signed-off-by: Ali Maredia <[email protected]>
1 parent 03afb6c commit 46b86fd

File tree

2 files changed

+11
-4
lines changed

2 files changed

+11
-4
lines changed

requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ pandas
1010
pandas-stubs
1111
lm-eval>=0.4.4
1212
httpx
13-
ragas
13+
ragas>=0.2.11

src/instructlab/eval/ragas.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
from ragas.evaluation import EvaluationDataset, EvaluationResult, RunConfig, evaluate
1313
from ragas.metrics import Metric
1414
from ragas.metrics._domain_specific_rubrics import ( # the rubrics we must instantiate are located inside of a file marked as private
15-
DEFAULT_WITH_REFERENCE_RUBRICS,
1615
RubricsScore,
16+
SingleTurnPrompt,
1717
)
1818

1919
# Local
@@ -22,6 +22,14 @@
2222

2323
logger = setup_logger(__name__)
2424

25+
OLD_DEFAULT_WITH_REFERENCE_RUBRICS = {
26+
"score1_description": "The response is incorrect, irrelevant, or does not align with the ground truth.",
27+
"score2_description": "The response partially matches the ground truth but includes significant errors, omissions, or irrelevant information.",
28+
"score3_description": "The response generally aligns with the ground truth but may lack detail, clarity, or have minor inaccuracies.",
29+
"score4_description": "The response is mostly accurate and aligns well with the ground truth, with only minor issues or missing details.",
30+
"score5_description": "The response is fully accurate, aligns completely with the ground truth, and is clear and detailed.",
31+
}
32+
2533

2634
class Sample(TypedDict):
2735
"""
@@ -256,9 +264,8 @@ def _generate_answers_from_model(
256264

257265
@staticmethod
258266
def _get_metrics() -> List[Metric]:
259-
# default set of metrics
260267
return [
261268
RubricsScore(
262-
rubrics=DEFAULT_WITH_REFERENCE_RUBRICS,
269+
rubrics=OLD_DEFAULT_WITH_REFERENCE_RUBRICS,
263270
)
264271
]

0 commit comments

Comments
 (0)