LiveCodeBench Pro - LLM Benchmarking Toolkit

This repository contains a benchmarking toolkit for evaluating Large Language Models (LLMs) on competitive programming tasks. The toolkit provides a standardized way to test your LLM's code generation capabilities across a diverse set of problems.

Overview

LiveCodeBench Pro evaluates LLMs on their ability to generate solutions for programming problems. The benchmark includes problems of varying difficulty levels from different competitive programming platforms.

Getting Started

Prerequisites

Python 3.12 or higher
pip package manager

Installation

Install the required dependencies:

pip install -r requirements.txt

How to Use

Step 1: Implement Your LLM Interface

Create your own LLM class by extending the abstract LLMInterface class in api_interface.py. Your implementation needs to override the call_llm method.

Example:

from api_interface import LLMInterface

class YourLLM(LLMInterface):
    def __init__(self):
        super().__init__()
        # Initialize your LLM client or resources here
        
    def call_llm(self, user_prompt: str):
        # Implement your logic to call your LLM with user_prompt
        # Return a tuple containing (response_text, metadata)
        
        # Example:
        response = your_llm_client.generate(user_prompt)
        return response.text, response.metadata

You can use the ExampleLLM class as a reference, which shows how to integrate with OpenAI's API.

Step 2: Configure the Benchmark

Edit the benchmark.py file to use your LLM implementation:

from your_module import YourLLM

# Replace this line:
llm_instance = YourLLM()  # Update with your LLM class

Step 3: Run the Benchmark

Execute the benchmark script:

python benchmark.py

The script will:

Load the LiveCodeBench-Pro dataset from Hugging Face
Process each problem with your LLM
Save the results to benchmark_result.json

Step 4: Submit Your Results

Send your benchmark_result.json file to [email protected] for evaluation.

Please include the following information in your submission:

LLM name and version
Any specific details
Contact information for results

Understanding the Codebase

api_interface.py

This file defines the abstract interface for LLM integration:

LLMInterface: Abstract base class with methods for LLM interaction
ExampleLLM: Example implementation with OpenAI's GPT-4o

benchmark.py

The main benchmarking script that:

Loads the dataset
Processes each problem through your LLM
Collects and saves results

Dataset

The benchmark uses the anonymous1926/anonymous_dataset dataset from Hugging Face, which contains competitive programming problems with varying difficulty levels.

Contact

For questions or support, please contact us at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
api_interface.py		api_interface.py
benchmark.py		benchmark.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LiveCodeBench Pro - LLM Benchmarking Toolkit

Overview

Getting Started

Prerequisites

Installation

How to Use

Step 1: Implement Your LLM Interface

Step 2: Configure the Benchmark

Step 3: Run the Benchmark

Step 4: Submit Your Results

Understanding the Codebase

api_interface.py

benchmark.py

Dataset

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

GavinZhengOI/LiveCodeBench-Pro

Folders and files

Latest commit

History

Repository files navigation

LiveCodeBench Pro - LLM Benchmarking Toolkit

Overview

Getting Started

Prerequisites

Installation

How to Use

Step 1: Implement Your LLM Interface

Step 2: Configure the Benchmark

Step 3: Run the Benchmark

Step 4: Submit Your Results

Understanding the Codebase

api_interface.py

benchmark.py

Dataset

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages