This repository hosts the codes and data for the paper "Fix the Tests: Augmenting LLMs to Repair Test Cases with Static Collector and Neural Reranker", accepted in in the 35th IEEE International Symposium on Software Reliability Engineering (ISSRE 2024).
The work introduces SynTeR for automated test repair using Large Language Models, where it augments LLM's capability by constructing precisely related contexts with static analysis and reranking.
- SynTeR: A novel contexts-aware test repair approach based on LLMs. Running script:
run_update_ctx.py
. - NaiveLLM (baseline): A naive test repair approach without contexts based on LLMs. Running script:
run_update_woctx.py
. - CEPROT (baseline): A SOTA test repair approach without contexts based on CodeT5, which we replicate elsewhere. In this repository, we only provide the results of CEPROT.
dataset
: The benchmark dataset used for evaluating SynTeR against baselines.logs
: Logs when running scripts.outputs
: The output results of approaches and evaluation.retriever
: The source codes of retriever for SynTeR, which is used to retrieve related contexts.utils
: The source codes of utilities for SynTeR, which provides abilities of parser, formatter configs and etc.README.md
: This file.- Jupyter Notebooks:
dataset_setup.ipynb
: The notebook for setting up projects in the datasets (Before running SynTeR, all the projects should be downloaded locally).human_eval.ipynb
: The notebook for assisting human evaluation.
- Running Scripts:
run_setup.py
: The script for setting up test_part dataset (download the repos).run_update_ctx.py
: The script for running SynTeR.run_update_woctx.py
: The script for running NaiveLLM.run_evaluate.py
: The script for evaluation (RQ1) on CodeBLEU, DiffBLEU, and Accuracy.
Be aware of the important notices at the end of this README before testing SynTeR.
-
Download and Load the docker image with prepared environments.
# assume that you have downloaded the image [synter-ubuntu.tar] form DOI: 10.5281/zenodo.13370573 docker load -i <local-path-to>/synter-ubuntu.tar docker run -it --name synter-demo synter-ubuntu /bin/bash
-
Notes: all the repositories will be saved at
/root/SynPTCEvo4J/repos
as already configured inutils/configs.py
.
- Move to the workspace.
cd /root/SynTeR && conda activate synter
- Add LLM configuarations. The default LLM used is GPT4 from OPENAI.
- utils/configs: your LLM API keys.
- [optional]utils/llm.py: if you want to use another OPENAI model or custom LLMs compatible with OPENAI API(such as DeepseekCoder), you should also modify the model here.
- Run SynTeR. By default, the script will repair the test cases in
dataset/test_part.json
, where we provide the motivation example fromAlluxio/alluxio
demonstrated in our paper.python run_update_ctx.py
- Run NaiveLLM (optional). By default, the script will also repair the test cases in
dataset/test_part.json
.python run_update_woctx.py
- Note: If you want to add test samples, you can refer to
dataset/test.json
. For every sample, SynTeR will check whether the repo exists atREPO_BASE=/root/SynPTCEvo4J/repos
, unloaded repos for the test datafile will be automatically cloned when you run SynTeR. You can also mannual clone them bydataset_setup.ipynb
andrun_setup.py
, which is more recommended since large repo may fail to clone.
Be aware of the important notices at the end of this README before testing SynTeR.
-
Environment requirements. It is recommended to run SynTeR in a Linux(Ubuntu) OS. Our approach can be applied on most modern machines (RAM capacity: >=32GB), where the most CPU-costing task is running the
bge-reranker-v2-m3
reranker model locally. -
Creat and move to the workspace.
mkdir SynTeR && cd SynTeR
-
Python environment setup. We recommend using miniconda to setup a virtual environment
conda create -n synter_venv python=3.11 conda activate synter_venv pip install -r requirements.txt
-
Install ClangFormat.
sudo apt install clang-format # check installation clang-format --version
-
Download the Reranker Model. We use bge-reranker-v2-m3 for reranking. The model should be downloaded to your local path, which should be consistent with the path in
utils/configs.py
.
By default, the scripts will repair the test cases in dataset/test_part.json
, where we provide the motivation example demonstrated in our paper.
-
Customize your configuration. Add your custom configs in
utils/configs.py
:- REPO_BASE: the path to store all the repos (should have enough space).
- RERANKER_MODEL_PATH: the bge-reranker-v2-m3 download path (huggingface repo).
- OPENAI_API_KEY: your own OPEANAI API keys.
- LANGCHAIN_API_KEY: add it if you want to use LangSmith for LLM tracing.
-
Run SynTeR: when SynTeR repairs a test case in the datafile, the corresponding repo will be automatically cloned into REPO_BASE configured in
utils/configs.py
.python run_update_ctx.py
The results will be saved in
outputs/SynTeR/test_part_all_ctx_wot.json
. The log is generated inlogs/run_update_ctx.log
(we also retain backups as reference). -
Run NaiveLLM (optional)
python run_update_woctx.py
The results will be saved in
outputs/NaiveLLM/test_part_woctx.json
. The log is inlogs/run_update_woctx.log
(we also retain backups for reference). -
Note: If you want to add test samples, you can refer to
dataset/test.json
. For every sample, SynTeR will check whether the repo exists atREPO_BASE=/root/SynPTCEvo4J/repos
, unloaded repos for the test datafile will be automatically cloned when you run SynTeR. You can also mannual clone them bydataset_setup.ipynb
andrun_setup.py
, which is more recommended since large repo may fail to clone.
Due to the limitation of multilsp, the language server sometimes may not be correctly closed as expected. Therefor, sometimes you may encounter Internal Errors when running SynTeR for multilsp. Just rerun the script until you get the expected results. For above issues, remember to check whether there are zombie processes using language servers after running the script of SynTeR.
# check
ps aux | grep 'language_servers' | grep -v grep
# kill
ps aux | grep 'language_servers' | grep -v grep | awk '{print $2}' | xargs kill