- Introduction
- Installation
- Getting Started
- Project Structure
- How does it work?
- Results
- References and Links
This repository supports a blog post that helps users estimate costs for large-scale classification, embedding, or vision embedding tasks. It provides benchmarking tools for different GPU types, batch sizes, and inference methods, using michaelfeil/infinity and Inference Endpoints.
I considered a variety of things like:
- GPU type
- Infinity Image type
- Varying Batch Sizes
- Varying VUs amounts
- Multiple Architectures
git clone https://github.com/datavistics/encoder-analysis.git
cd encoder-analysis
pip install -r requirements.txt
pip install jupyterlab
- Install k6 based on your platform
Make sure you have the ability to deploy an Inference Endpoint
- Run
jupyter lab
- Choose your task [
classification
,embedding
,vision-embedding
] - Run
<task>-optimization.ipynb
to get the best configuration - Run
<task>-analysis.ipynb
to visualize the results - Alternatively run
<task>-analysis-gradio.ipynb
to have more interactive results
- There are notebooks in the top level for convenience. Its probably cleaner to put them in
./notebooks
but its annoying to add it to path, so I opted for user satisfaction rather than aesthetics- *-optimization.ipynb - These were used for generating and conducting the experiments
- *-analysis.ipynb - These show the analysis in a clean notebook-centric way
- *-analysis-gradio.ipynb - These show the analysis in an interactive gradio-centric way
src
I abstracted a fair amount of code here. I tried to keep any important details in the notebookstemplates
these are the k6 jinja templates that I use to generate each experimentdata
,generated
, andresults
are used to store non-version-controlled project files
Each of the *-optimization.ipynb notebooks facilitates this structure:
flowchart TD;
subgraph Benchmarking Server
A[k6 Load Testing]
D[Instance Config]
end
subgraph Inference Endpoint
C[Container Running Infinity]
E[Next Inference Endpoint]
end
D -->|Defines Test Parameters| A
D -->|Deploys Inference Endpoint| E
A -->|Sends Test Data| C
C -->|Processes and Returns| A
- Define the benchmarking parameters (GPU, batch size, VUs, etc)
- Deploy the inference server (Infinity on Hugging Face Endpoints)
- Run K6 performance tests to evaluate speed, cost, and efficiency
- Store and visualize results for optimization
Do check out these notebooks in nbviewer, as I took a lot of effort to make sure they are interactive. Unfortunately they look better in light mode due to the tables. But follow your heart.
- classification-analysis-gradio.ipynb
- embedding-analysis-gradio.ipynb
- vision-embedding-analysis-gradio.ipynb
For lxyuan/distilbert-base-multilingual-cased-sentiments-student
on a dataset like tyqiangz/multilingual-sentiments
(using the text
column) we can do 1 Billion classifications for only $253.82
.
GPU | Image | Batch Size | VUs | Min Cost |
---|---|---|---|---|
nvidia-l4 | default |
64 | 448 | $253.82 |
For Alibaba-NLP/gte-modernbert-base on a dataset
like sentence-transformers/trivia-qa-triplet
(using the positive
column) we can do 1 Billion embeddings for only $409.44
.
GPU | Batch Size | VUs | Min Cost |
---|---|---|---|
nvidia-l4 | 256 | 32 | $409.44 |
For vidore/colqwen2-v1.0-merged on a dataset
like openbmb/RLAIF-V-Dataset
(using the image
column) we can do 1 Billion ColBERT style embeddings (late interaction) on images for $44496.51
.
GPU | Batch Size | VUs | Min Cost |
---|---|---|---|
nvidia-l4 | 4 | 4 | $44496.51 |