[v2] Refactior task specific scores #2016

Samoed · 2025-02-07T17:20:04Z

mteb/mteb/evaluation/evaluators/utils.py

Lines 356 to 372 in 724f553

    
           TASK_TO_HF_DATASET = { 
        
               "Core17InstructionRetrieval": ("jhu-clsp/core17-instructions-mteb", False), 
        
               "Robust04InstructionRetrieval": ("jhu-clsp/robust04-instructions-mteb", False), 
        
               "News21InstructionRetrieval": ("jhu-clsp/news21-instructions-mteb", False), 
        
               "mFollowIR": ("jhu-clsp/mfollowir-parquet-mteb", True), 
        
               "mFollowIRCrossLingual": ( 
        
                   "jhu-clsp/mfollowir-cross-lingual-parquet-mteb", 
        
                   True, 
        
               ), 
        
           } 
        
           hf_path, is_multilingual = TASK_TO_HF_DATASET[task_name] 
        
           if is_multilingual: 
        
               # figure out which of the languages this is: ["zho", "rus", "fas"] 
        
               # gather the changed_qrels for each, and store the keys as a check 
        
               for lang in ["zho", "rus", "fas"]: 
        
                   config_name = f"qrel_diff-{lang}" 
        
                   changed_qrels = {

mteb/mteb/evaluation/evaluators/utils.py

Lines 587 to 610 in 724f553

    
           task_scores = {} 
        
           if task_name in ["NevIR"]: 
        
               paired_score = paired_accuracy(qrels, results, scores) 
        
               task_scores["paired_accuracy"] = paired_score 
        
           if task_name in ["InstructIR"]: 
        
               robustness_at_10_score = robustness_at_10(qrels, results, scores) 
        
               task_scores["robustness_at_10"] = robustness_at_10_score 
        
           if task_name in [ 
        
               "mFollowIR", 
        
               "mFollowIRCrossLingual", 
        
               "Robust04InstructionRetrieval", 
        
               "Core17InstructionRetrieval", 
        
               "News21InstructionRetrieval", 
        
           ]: 
        
               p_mrr_and_consolidated_scores = evaluate_p_mrr_change( 
        
                   results, qrels, task_name, k_values 
        
               ) 
        
               task_scores.update(p_mrr_and_consolidated_scores) 
        
           if task_name in ["MindSmallReranking"]: 
        
               take_max_over_subqueries = max_over_subqueries(qrels, results, k_values) 
        
               task_scores.update(take_max_over_subqueries)

Samoed added the v2 Issues and PRs related to `v2` branch label Feb 7, 2025

Samoed added this to the v2.0.0 milestone Feb 7, 2025

Samoed mentioned this issue Feb 7, 2025

Merge v2.0.0: Overview issue #1791

Open

50 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2] Refactior task specific scores #2016

[v2] Refactior task specific scores #2016

Samoed commented Feb 7, 2025

[v2] Refactior task specific scores #2016

[v2] Refactior task specific scores #2016

Comments

Samoed commented Feb 7, 2025