IntelliPerf: LLM-Powered Autonomous GPU Performance Engineer

Important

This project is intended for research purposes only and is provided by AMD Research and Advanced Development team. This is not a product. Use it at your own risk and discretion.

Overview

IntelliPerf is an automated performance engineering framework that addresses the complex challenge of GPU kernel optimization. Manual optimization requires deep domain expertise and is time-consuming, error-prone, and resource-intensive. IntelliPerf systematizes this workflow by orchestrating a comprehensive toolchain that automatically profiles applications using rocprofiler-compute, identifies high-level bottlenecks with Guided Tuning, pinpoints specific source code lines using Omniprobe, generates optimized code through Large Language Models (LLMs), and validates results using Accordo for correctness and performance. Built on a modular "formula-driven" architecture, it targets specific bottlenecks like bank conflicts, memory access patterns, and atomic contention through a sophisticated multi-stage optimization loop that includes profiling, analysis, code generation, and automated validation.

Key Features

AI-Powered Optimization: Generates optimized code using LLMs with iterative feedback for performance improvements
Precise Analysis: Pinpoints performance issues down to specific source code lines using compiler-based instrumentation
Automated Validation: Validates both correctness and performance improvements through runtime comparison
Comprehensive Coverage: Supports multiple bottleneck types (bank conflicts, memory access, atomic contention)
CI/CD Integration: Seamless workflow integration with automated pull request generation
Extensible Architecture: Formula-driven design for easy addition of new optimization targets

Installation

Quick Start with Containers

We provide both Apptainer and Docker images for easy setup:

Using Apptainer

./apptainer/build.sh
./apptainer/run.sh

Using Docker

./docker/build.sh
./docker/run.sh

Or use our prebuilt Docker image:

docker pull audacioussw/intelliperf:latest
export LLM_GATEWAY_KEY="your_api_key_here"
docker run -it --rm --device=/dev/kfd --device=/dev/dri --group-add video -e LLM_GATEWAY_KEY="$LLM_GATEWAY_KEY" audacioussw/intelliperf

For baremetal installation

Install Additional Dependencies:

# ROCm dependencies
apt-get install -y rocm-llvm-dev libzstd-dev

# KernelDB dependencies
apt-get install -y libdwarf-dev

# Omniperf dependencies
apt-get install -y locales
locale-gen en_US.UTF-8

Installation from Source

Note

Due to the complex dependency chain, IntelliPerf currently supports development mode installation only. Future versions will support standard pip installation.

Clone the Repository:

git clone [email protected]:AMDResearch/intelliperf.git
cd intelliperf

Install IntelliPerf:
```
pip install -e .
```
Install Dependencies:
```
python3 scripts/install_tool.py --all
```

Environment Variables

Set the following environment variable for AI-powered optimization:

export LLM_GATEWAY_KEY="your_api_key_here"

Required for bank conflicts, memory access patterns, and atomic contention optimization. The AI-powered optimization supports various language models and providers through the --provider and --model command line arguments. The key should be the backend key for the specified provider.

Supported GPUs

IntelliPerf currently supports:

MI300X

Note

IntelliPerf may work on other AMD GPUs with ROCm compatibility, but has only been tested on MI300X.

Usage

IntelliPerf can be used to analyze and optimize your GPU applications:

intelliperf [options] -- <profile_cmd>

Examples

# Optimize bank conflicts in a HIP application
intelliperf -b ~/rocBLAS/build.sh -f bankConflict -- ~/rocBLAS/build/bin/rocblas_gemm

# Diagnose a Triton application
intelliperf -- python3 gemm.py

Command Line Options

Option	Description
`-h, --help`	Show help message and exit
`-v, --verbose`	Increase verbosity level (e.g., -v, -vv, -vvv)
`-b, --build_command`	Command to build your application
`-i, --instrument_command`	Command to build your application with instrument
`-p, --project_directory`	Directory containing your codebase
`-f, --formula`	Optimization formula to use (bankConflict, memoryAccess, atomicContention, diagnoseOnly)
`--top_n`	Control top-n kernels in diagnoseOnly mode (default: 10)
`--num_attempts`	Control optimization attempts (default: 10)
`-o, --output_file`	Path to output file
`-t, --accordo_absolute_tolerance`	Validation tolerance
`-m, --model`	Specify the model to use for optimization (default: gpt-4o)
`-r, --provider`	Specify the provider to use for optimization (default: openai)
`-l, --in_place`	Modify source files in place during optimization (default: creates backups)

Note

IntelliPerf copies the entire project directory to a temporary location. Make sure your project doesn't include any temporary CMake files if you pass the project_directory flag.

Documentation

IntelliPerf Technical Paper - Detailed technical overview of the IntelliPerf framework
Running Examples
AMD Developer Cloud Setup Guide - Step-by-step instructions for setting up IntelliPerf on AMD Developer Cloud GPU droplets

Citation

If you use IntelliPerf or discuss our work in your research, please always cite our work:

@software{   Awad:2025:ILP,
  author        = {Muhammad Awad and Cole Ramos and Keith Lowery},
  title         = {IntelliPerf: {LLM}-Powered Autonomous {GPU} Performance Engineer},
  year          = 2025,
  month         = jul,
  doi           = {10.5281/zenodo.15845118},
  url           = {https://github.com/AMDResearch/intelliperf},
  code          = {https://github.com/AMDResearch/intelliperf}
}

You can also use the CITATION.cff file in the repository root for automatic citation generation.

Contributing

We welcome contributions! Please see our Contributing Guide for details on how to set up your development environment and contribute to the project.

Support

For support, please:

Open an issue
Contact the development team

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github		.github
apptainer		apptainer
docker		docker
docs		docs
examples		examples
external/guided-tuning		external/guided-tuning
images		images
scripts		scripts
src		src
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IntelliPerf: LLM-Powered Autonomous GPU Performance Engineer

Overview

Key Features

Installation

Quick Start with Containers

Using Apptainer

Using Docker

For baremetal installation

Installation from Source

Environment Variables

Supported GPUs

Usage

Examples

Command Line Options

Documentation

Citation

Contributing

Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

AMDResearch/intelliperf

Folders and files

Latest commit

History

Repository files navigation

IntelliPerf: LLM-Powered Autonomous GPU Performance Engineer

Overview

Key Features

Installation

Quick Start with Containers

Using Apptainer

Using Docker

For baremetal installation

Installation from Source

Environment Variables

Supported GPUs

Usage

Examples

Command Line Options

Documentation

Citation

Contributing

Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages