DocAgent is a system designed to generate high-quality, context-aware docstrings for Python codebases using a multi-agent approach and hierarchical processing.
If you use DocAgent in your research, please cite our paper:
@misc{yang2025docagent,
title={DocAgent: A Multi-Agent System for Automated Code Documentation Generation},
author={Dayu Yang and Antoine Simoulin and Xin Qian and Xiaoyi Liu and Yuwei Cao and Zhaopu Teng and Grey Yang},
year={2025},
eprint={2504.08725},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
You can find the paper on arXiv: https://arxiv.org/abs/2504.08725
- Motivation
- Methodology
- Installation
- Components
- Configuration
- Usage
- Running the Evaluation System
- Optional: Using a Local LLM
High-quality docstrings are crucial for code readability, usability, and maintainability, especially in large repositories. They should explain the purpose, parameters, returns, exceptions, and usage within the broader context. Current LLMs often struggle with this, producing superficial or redundant comments and failing to capture essential context or rationale. DocAgent aims to address these limitations by generating informative, concise, and contextually aware docstrings.
DocAgent employs two key strategies:
- Hierarchical Traversal: Processes code components by analyzing dependencies, starting with files having fewer dependencies. This builds a documented foundation before tackling more complex code, addressing the challenge of documenting context that itself lacks documentation.
- Agentic System: Utilizes a team of specialized agents (
Reader
,Searcher
,Writer
,Verifier
) coordinated by anOrchestrator
. This system gathers context (internal and external), drafts docstrings according to standards, and verifies their quality in an iterative process.
For more details on the agentic framework, see the Agent Component README.
- Clone the repository:
git clone <repository_url> cd DocAgent
- Install the necessary dependencies. It's recommended to use a virtual environment:
Note: For optional features like development tools, web UI components, or specific hardware support (e.g., CUDA), refer to the comments in
python -m venv venv source venv/bin/activate # if you use venv, you can also use conda pip install -e .
setup.py
and install extras as needed (e.g.,pip install -e ".[dev,web]"
).
DocAgent is composed of several key parts:
- Core Agent Framework: Implements the multi-agent system (Reader, Searcher, Writer, Verifier, Orchestrator) responsible for the generation logic.
- Docstring Evaluator: Provides tools for evaluating docstring quality, primarily focusing on completeness based on static code analysis (AST). Note: Evaluation is run separately, see its README.
- Generation Web UI: A web interface for configuring, running, and monitoring the docstring generation process in real-time.
Before running DocAgent, you must create a configuration file named config/agent_config.yaml
. This file specifies crucial parameters for the agents, such as the LLM endpoints, API keys (if required), model names, and generation settings.
- Copy the Example: An example configuration file is provided at
config/example_config.yaml
. Copy this file toconfig/agent_config.yaml
:cp config/example_config.yaml config/agent_config.yaml
- Edit the Configuration: Open
config/agent_config.yaml
in a text editor and modify the settings according to your environment and requirements. Pay close attention to the LLM provider, model selection, and any necessary API credentials.
You can run the docstring generation process using either the command line or the web UI.
1. Command Line Interface (CLI)
This is the primary method for running the generation process directly.
# Example: Run on a test repo (remove existing docstrings first if desired)
./test/tool/remove_docstrings.sh data/raw_test_repo
python generate_docstrings.py --repo-path data/raw_test_repo
Use python generate_docstrings.py --help
to see available options, such as specifying different configurations or test modes.
2. Generation Web UI
The web UI provides a graphical interface to configure, run, and monitor the process.
- Note that when input repo path, always put complete absolute path.
# Launch the web UI server
python run_web_ui.py --host 0.0.0.0 --port 5000
Then, access the UI in your web browser, typically at http://localhost:5000
. If running the server remotely, you might need to set up SSH tunneling (see instructions below or the Web UI README).
Basic SSH Tunneling (if running server remotely):
# In your local terminal
ssh -L 5000:localhost:5000 <your_remote_username>@<your_remote_host>
# Then access http://localhost:5000 in your local browser
DocAgent includes a separate web-based interface for evaluating the quality of generated docstrings.
1. Running Locally
To run the evaluation system on your local machine:
python src/web_eval/app.py
Then, access the evaluation UI in your web browser at http://localhost:5001
.
2. Running on a Remote Server
To run the evaluation system on a remote server:
python src/web_eval/app.py --host 0.0.0.0 --port 5001
Then, set up SSH tunneling to access the remote server from your local machine:
ssh -L 5001:localhost:5001 <your_remote_username>@<your_remote_host>
Once the tunnel is established, access the evaluation UI in your local web browser at http://localhost:5001
.
If you prefer to use a local LLM (e.g., one hosted via Hugging Face), you can configure DocAgent to interact with it via an API endpoint.
-
Serve the Local LLM: Use a tool like
vllm
to serve your model. A convenience script is provided:# Ensure vllm is installed: pip install vllm bash tool/serve_local_llm.sh
This script will likely start an OpenAI-compatible API server (check the script details). Note the URL where the model is served (e.g.,
http://localhost:8000/v1
). -
Configure DocAgent: Update your
config/agent_config.yaml
to point to the local LLM API endpoint. You'll typically need to set:- The
provider
toopenai
(if using an OpenAI-compatible server like vllm's default). - The
api_base
or equivalent URL parameter to your local server address (e.g.,http://localhost:8000/v1
). - The
model_name
to the appropriate identifier for your local model. - Set the
api_key
toNone
or an empty string if no key is required by your local server.
- The
-
Run DocAgent: Run the generation process as usual (CLI or Web UI). DocAgent will now send requests to your local LLM.
This project is licensed under the MIT License - see the LICENSE file for details.