Important
This project is intended for research purposes only and is provided by AMD Research and Advanced Development team. This is not a product. Use it at your own risk and discretion.
IntelliPerf is an automated performance engineering framework that addresses the complex challenge of GPU kernel optimization. Manual optimization requires deep domain expertise and is time-consuming, error-prone, and resource-intensive. IntelliPerf systematizes this workflow by orchestrating a comprehensive toolchain that automatically profiles applications using rocprofiler-compute, identifies high-level bottlenecks with Guided Tuning, pinpoints specific source code lines using Omniprobe, generates optimized code through Large Language Models (LLMs), and validates results using Accordo for correctness and performance. Built on a modular "formula-driven" architecture, it targets specific bottlenecks like bank conflicts, memory access patterns, and atomic contention through a sophisticated multi-stage optimization loop that includes profiling, analysis, code generation, and automated validation.
- AI-Powered Optimization: Generates optimized code using LLMs with iterative feedback for performance improvements
- Precise Analysis: Pinpoints performance issues down to specific source code lines using compiler-based instrumentation
- Automated Validation: Validates both correctness and performance improvements through runtime comparison
- Comprehensive Coverage: Supports multiple bottleneck types (bank conflicts, memory access, atomic contention)
- CI/CD Integration: Seamless workflow integration with automated pull request generation
- Extensible Architecture: Formula-driven design for easy addition of new optimization targets
We provide both Apptainer and Docker images for easy setup:
./apptainer/build.sh
./apptainer/run.sh
./docker/build.sh
./docker/run.sh
- Install Additional Dependencies:
# ROCm dependencies apt-get install -y rocm-llvm-dev libzstd-dev # KernelDB dependencies apt-get install -y libdwarf-dev # Omniperf dependencies apt-get install -y locales locale-gen en_US.UTF-8
Note
Due to the complex dependency chain, IntelliPerf currently supports development mode installation only. Future versions will support standard pip installation.
-
Clone the Repository:
git clone [email protected]:AMDResearch/intelliperf.git cd intelliperf
-
Install IntelliPerf:
pip install -e .
-
Install Dependencies:
python3 scripts/install_tool.py --all
Set the following environment variable for AI-powered optimization:
export LLM_GATEWAY_KEY="your_api_key_here"
Required for bank conflicts, memory access patterns, and atomic contention optimization. The AI-powered optimization supports various language models and providers through the --provider
and --model
command line arguments. The key should be the backend key for the specified provider.
IntelliPerf currently supports:
- MI300X
Note
IntelliPerf may work on other AMD GPUs with ROCm compatibility, but has only been tested on MI300X.
IntelliPerf can be used to analyze and optimize your GPU applications:
intelliperf [options] -- <profile_cmd>
# Optimize bank conflicts in a HIP application
intelliperf -b ~/rocBLAS/build.sh -f bankConflict -- ~/rocBLAS/build/bin/rocblas_gemm
# Diagnose a Triton application
intelliperf -- python3 gemm.py
Option | Description |
---|---|
-h, --help |
Show help message and exit |
-v, --verbose |
Increase verbosity level (e.g., -v, -vv, -vvv) |
-b, --build_command |
Command to build your application |
-i, --instrument_command |
Command to build your application with instrument |
-p, --project_directory |
Directory containing your codebase |
-f, --formula |
Optimization formula to use (bankConflict, memoryAccess, atomicContention, diagnoseOnly) |
--top_n |
Control top-n kernels in diagnoseOnly mode (default: 10) |
--num_attempts |
Control optimization attempts (default: 10) |
-o, --output_file |
Path to output file |
-t, --accordo_absolute_tolerance |
Validation tolerance |
-m, --model |
Specify the model to use for optimization (default: gpt-4o) |
-r, --provider |
Specify the provider to use for optimization (default: openai) |
-l, --in_place |
Modify source files in place during optimization (default: creates backups) |
- IntelliPerf Technical Paper - Detailed technical overview of the IntelliPerf framework
- Running Examples
We welcome contributions! Please see our Contributing Guide for details on how to set up your development environment and contribute to the project.
For support, please:
- Open an issue
- Contact the development team
This project is licensed under the MIT License - see the LICENSE file for details.