This repository provides scripts and tools for evaluating the performance of decompilation processes using both traditional decompilers and large language models (LLMs). It is used in the paper "DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios".
LLVM 18, install from LLVM Debian/Ubuntu nightly packages
To begin, clone the oss-fuzz
project.
git clone https://github.com/google/oss-fuzz.git
Then we modify the base-builder
Dockerfile to include bear
and clang-extract
to support the function extraction.
# Download prebuilt clang-extract
wget 'https://seafile.vul337.team:8443/f/1f11e8c4a8eb46dcb981/?dl=1' -O oss-fuzz/infra/base-images/base-builder/clang-extract.tar.gz
# Add bear and clang-extract to base-builder Dockerfile
cd oss-fuzz
git checkout 4bca88f3a369679336485181961db305161fe240
git apply ../oss-fuzz-patch/*.diff
Then we build the Docker image.
python infra/helper.py build_image base-builder --cache --pull
python infra/helper.py build_image base-runner --cache --no-pull
Then we compile the dummy library for linking with the fuzzer.
docker run -it --rm -w /work -v $(pwd):/work gcr.io/oss-fuzz-base/base-builder bash -c "clang dummy.c -o libfunction.so -O2 -fPIC -shared && clang ld.c -o ld.so -shared -fPIC -O2"
This repository includes a patched llvm-cov
binary. It is identical to the llvm-cov
bundled with official LLVM apt source, but contains a binary-level patch to prevent formatting counters in the output. The patch is shown below:
.text:00000000000A15D8 cmp r13d, 3
.text:00000000000A15DC nop ; Keypatch modified this from:
.text:00000000000A15DC ; jg short loc_A1608
.text:00000000000A15DC ; Keypatch padded NOP to next boundary: 1 bytes
.text:00000000000A15DD nop
.text:00000000000A15DE lea rax, [rbx+10h]
.text:00000000000A15E2 mov [rbx], rax
.text:00000000000A15E5 mov rcx, [rsp+88h+src]
The default configuration file is located at config.yaml
, containing:
oss_fuzz_path
: The path to theoss-fuzz
project.decompilers
: A list of decompilers to be evaluated.opts
: A list of optimization levels to be evaluated.
Many scripts contain the --config
parameter to specify the configuration file.
python extract_functions.py
Optionally, extract only several selected projects with 96 workers
python3 extract_functions.py --worker-count 96 --project file,libprotobuf-mutator
Initially, execute the fuzzers to collect covered functions, including their names and corresponding files. Coverage information is recorded in {oss_fuzz_path}/build/stats/{project}/{fuzzer}_result.json
.
For each function covered by the fuzzer, use clang
and clang-extract
to extract functions with external dependencies from each project, storing them in {oss_fuzz_path}/functions/{project}
.
To compile the extracted functions, ensure that LLVM and Clang are installed on your system.
Specify the libclang library file path in LIBCLANG_PATH
, for example, export LIBCLANG_PATH=/usr/lib/llvm-16/lib/libclang-16.so.1
, adjusting it to match your installation path.
Set the oss_fuzz_path
in the config.yaml
and the desired output path, then execute the following command:
export LIBCLANG_PATH="/usr/lib/llvm-18/lib/libclang-18.so.1"
export dataset_path=path/to/the/dataset
python compile_ossfuzz.py --output $dataset_path
This script organizes all functions into a dataset, formatted as datasets
. It compiles these functions using clang
, applying optimization levels from O0
to Os
.
The resulting binaries are stored in $dataset_path/binary
.
The dataset containing the metadata is located in $dataset_path/compiled_ds
. The metadata includes the function name, the prolouge for the function (macro, structure definition), the address of the target function to be decompiled, and the path to the binary file.
The dataset acts as the ground truth for evaluating and is stored in $dataset_path/eval
. It contains the function name, the prolouge for the function (macro, structure definition), and the original source code. The columns inside this dataset are a subset of the columns in the compiled_ds
dataset.
This section outlines the scripts used for decompilation, utilizing both traditional decompilers and large language models (LLMs).
We utilize a decompiler-service to perform scalable decompilation. The service is hosted on a server.
cd decompiler-service
pip install -r requirements.txt
Then we need to provide the necessary binaries and licenses for the decompilers. For Hex-Rays, BinaryNinja, Dewolf, and etc, you need to have a license for the respective decompiler. Refer to decompiler-service/README.md for more information.
Build the decompiler images with the following command:
enabled_decompilers="--with-angr --with-ghidra --with-recstudio --with-reko --with-retdec --with-binja --with-dewolf --with-hexrays --with-mlm"
python manage.py $enabled_decompilers build
To start the decompiler service, run:
python manage.py $enabled_decompilers start
We use a dedicated client named declient
to interact with the decompiler-service. Install the client by:
pip install -e ./decompiler-service/src/declient
To warmup the decompiler service (which is necessary for each time restart the decompiler service), run:
python decompiler-service/scripts/test_decompile_async.py
This should return a successful response from the decompiler-service. And the result will be stored in ./my_task_queue.json
To obtain decompiled code from traditional decompilers (Make sure the decompiler-service is running and warmed up), execute:
# use hexrays to decompile
python decompile.py --base-dataset-path $dataset_path --output $dataset_path/decompiled_ds_hexrays --decompilers hexrays
# use ghidra to decompile
python decompile.py --base-dataset-path $dataset_path --output $dataset_path/decompiled_ds_ghidra --decompilers ghidra
# or use both hexrays and ghidra to decompile simultaneously
python decompile.py --base-dataset-path $dataset_path --output $dataset_path/decompiled_ds_ghidra_hexrays --decompilers ghidra,hexrays
dataset
: Path to the dataset from the previous compilation step, it should containcompiled_ds
andbinary
.output
: Path where the decompiled code dataset will be stored.
This script interfaces with a server hosting six traditional decompilers, such as Hex-Rays, to request decompiled code asynchronously.
python refine.py --dataset $dataset_path/decompiled_ds_hexrays --model gpt-4o-mini --output-file $dataset_path/gpt-4o-mini.jsonl --concurrency 30
python merge.py --base-dataset-path $dataset_path/ --decompiled-datasets $dataset_path/gpt-4o-mini.jsonl $dataset_path/decompiled_ds_ghidra/ $dataset_path/decompiled_ds_hexrays/ --output $dataset_path/decompiled_ds
This section describes the evaluation of decompiled code.
Before evaluation, integrate all decompiler outputs, including those from LLMs, into a single dataset saved at ./decompiled_ds_all
. Then, execute:
python evaluate_rsr.py --decompiled-dataset $dataset_path/decompiled_ds --decompilers hexrays
Enable the debug parameter to print error messages for specific data. This script recompiles the specified decompiler outputs in Docker, applies fixes, and reports success rates across different optimization levels. Successfully compiled functions are stored as shared libraries in {oss_fuzz_path}/build/challenges
for further evaluation.
To assess coverage differences before and after replacing with decompiled code, run:
python evaluate_cer.py --dataset $dataset_path/decompiled_ds
This script generates coverage reports for each function by linking the reference (base) shared object and the decompiled function's shared object separately.
Finally, evaluate code quality: Before running, you can set the model's URL (OPENAI_BASE_URL) and API key (OPENAI_API_KEY) in the environment variables.
python code_quality.py --run --model your_model --dataset ./decompiled_ds_all --output your_output_path
This script conducts an LLM arena evaluation across 12 dimensions, computing Elo scores to assess code quality. The output path will contain all scoring information in PKL files. Use the rate
parameter instead of run
to calculate Elo scores for different aspects and overall performance.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@misc{gao2025decompilebenchcomprehensivebenchmarkevaluating,
title={DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios},
author={Zeyu Gao and Yuxin Cui and Hao Wang and Siliang Qin and Yuanda Wang and Bolun Zhang and Chao Zhang},
year={2025},
eprint={2505.11340},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2505.11340},
}