Introduction

This repository supports a blog post that helps users estimate costs for large-scale classification, embedding, or vision embedding tasks. It provides benchmarking tools for different GPU types, batch sizes, and inference methods, using michaelfeil/infinity and Inference Endpoints.

I considered a variety of things like:

GPU type
Infinity Image type
Varying Batch Sizes
Varying VUs amounts
Multiple Architectures

Installation

I used

git clone https://github.com/datavistics/encoder-analysis.git
cd encoder-analysis
pip install -r requirements.txt
pip install jupyterlab
Install k6 based on your platform

Getting Started

Make sure you have the ability to deploy an Inference Endpoint

Run jupyter lab
Choose your task [classification, embedding, vision-embedding]
Run <task>-optimization.ipynb to get the best configuration
Run <task>-analysis.ipynb to visualize the results
Alternatively run <task>-analysis-gradio.ipynb to have more interactive results

Project Structure

There are notebooks in the top level for convenience. Its probably cleaner to put them in ./notebooks but its annoying to add it to path, so I opted for user satisfaction rather than aesthetics
- *-optimization.ipynb - These were used for generating and conducting the experiments
- *-analysis.ipynb - These show the analysis in a clean notebook-centric way
- *-analysis-gradio.ipynb - These show the analysis in an interactive gradio-centric way
src I abstracted a fair amount of code here. I tried to keep any important details in the notebooks
templates these are the k6 jinja templates that I use to generate each experiment
data, generated, and results are used to store non-version-controlled project files

How does it work?

Each of the *-optimization.ipynb notebooks facilitates this structure:

flowchart TD;
    subgraph Benchmarking Server
        A[k6 Load Testing]
        D[Instance Config]
    end

    subgraph Inference Endpoint
        C[Container Running Infinity]
        E[Next Inference Endpoint]
    end

    D -->|Defines Test Parameters| A
    D -->|Deploys Inference Endpoint| E
    A -->|Sends Test Data| C
    C -->|Processes and Returns| A

Define the benchmarking parameters (GPU, batch size, VUs, etc)
Deploy the inference server (Infinity on Hugging Face Endpoints)
Run K6 performance tests to evaluate speed, cost, and efficiency
Store and visualize results for optimization

Results

Do check out these notebooks in nbviewer, as I took a lot of effort to make sure they are interactive. Unfortunately they look better in light mode due to the tables. But follow your heart.

Classification

For lxyuan/distilbert-base-multilingual-cased-sentiments-student on a dataset like tyqiangz/multilingual-sentiments (using the text column) we can do 1 Billion classifications for only $253.82.

GPU	Image	Batch Size	VUs	Min Cost
nvidia-l4	`default`	64	448	$253.82

Interactive Version here

Embedding

For Alibaba-NLP/gte-modernbert-base on a dataset like sentence-transformers/trivia-qa-triplet (using the positive column) we can do 1 Billion embeddings for only $409.44.

GPU	Batch Size	VUs	Min Cost
nvidia-l4	256	32	$409.44

Interactive Version here

Vision Embedding

For vidore/colqwen2-v1.0-merged on a dataset like openbmb/RLAIF-V-Dataset (using the image column) we can do 1 Billion ColBERT style embeddings (late interaction) on images for $44496.51.

GPU	Batch Size	VUs	Min Cost
nvidia-l4	4	4	$44496.51

Interactive Version here

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
data		data
generated		generated
media		media
results		results
src		src
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classification-analysis-gradio.ipynb		classification-analysis-gradio.ipynb
classification-analysis.ipynb		classification-analysis.ipynb
classification-optimization.ipynb		classification-optimization.ipynb
embedding-analysis-gradio.ipynb		embedding-analysis-gradio.ipynb
embedding-analysis.ipynb		embedding-analysis.ipynb
embedding-optimization.ipynb		embedding-optimization.ipynb
requirements.txt		requirements.txt
vision-embedding-analysis-gradio.ipynb		vision-embedding-analysis-gradio.ipynb
vision-embedding-analysis.ipynb		vision-embedding-analysis.ipynb
vision-embedding-optimization.ipynb		vision-embedding-optimization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Introduction

Installation

Getting Started

Project Structure

How does it work?

Results

Classification

Embedding

Vision Embedding

References and Links

About

Releases

Packages

Languages

License

datavistics/encoder-analysis

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Introduction

Installation

Getting Started

Project Structure

How does it work?

Results

Classification

Embedding

Vision Embedding

References and Links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages