Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance and Accuracy Benchmarks #53

Open
Tracked by #124
jamescho72 opened this issue Sep 29, 2024 · 1 comment
Open
Tracked by #124

Performance and Accuracy Benchmarks #53

jamescho72 opened this issue Sep 29, 2024 · 1 comment
Assignees

Comments

@jamescho72
Copy link

Setup and run benchmarks against our continue.dev/ollama/granite environment.
Run baselines against our competitors Deepseek2.5 2.4B active and 21B active, codestral-mamba 7B, llama3-8B-instruct, and granite 8B instruct 128k context length.

Find 100 line code example
Ask chat to document
Measure latency (how long to complete)
Measure accuracy (How many lines of documentation was generated, How accurate/correct was the documentation IE 9/10 lines correctly)
Measure CPU consumption, Memory consumption
Automate/standardize the test as much as possible

@harshmittalibm
Copy link

I have put my initial findings here -

https://ibm.box.com/s/l69aksjokmnwdb6u2frpd715d6537pq8

It consists of the latency comparison between different models. I will update it with latency of documentation and its accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants