Skip to content

Latest commit

 

History

History
120 lines (93 loc) · 4.5 KB

README.md

File metadata and controls

120 lines (93 loc) · 4.5 KB

IBL GitHub Bot

Uses GPT-4 to automatically generate tests based on pytest.

A cli is exposed at ibl_github_bot/__main__.py.

$ python -m ibl_github_bot --help
Usage: python -m ibl_github_bot [OPTIONS]

Options:
  --repo TEXT             Repository to clone. Must be of the format
                          username/reponame. eg. ibleducation/ibl-ai-github-
                          bot
  --branch TEXT           Branch to clone repository from.
  -f, --file TEXT         Target file in repository to test. Defaults to all
                          files. You can pass multiple files with -f file1 -f
                          file2
  --cleanup               Delete cloned repository after test generation.
  --github-token TEXT     Github token used to authenticate and clone
                          repository. Token must have write access to the
                          repository.
  --github-username TEXT  Username associated with the github token
  --help                  Show this message and exit.

For example:

$ python -m  ibl_github_bot --repo ibleducation/ibl-ai-bot-app --branch slack --cleanup -f ibl_ai_bot/views.py

Important

You may export your GitHub token as an environment variable or place it in a .env file in the current working directory.
Name the environment variable GH_TOKEN

A new branch and related pull request will be created on the repository specified containing the generated tests.

Warning

Do not blindly merge the pull requests created. Always check out the pull request and run the tests.

Environment Variables

Place the following variables in an .env file in the current working directory, or exported as system environment variables:

  1. GH_TOKEN: This stores a valid GitHub token to be used by the bot to pull repositories, push commits and create pull requests.
  2. GH_USERNAME: The appropriate username associated with the GitHub token.
  3. OPENAI_API_KEY: A valid OpenAI key with GPT4 access.

Below is a sample .env file:

OPENAI_API_KEY=sk....
GH_USERNAME=username
GH_TOKEN=gh-............

Configuration

The bot is capable of loading configurations from the specified repository to alter its behaviour.

These configurations include specifying the programming language, frameworks used, testing library as well as module dependencies. Configurations are specified in ibl_test_config.yaml Below is a sample configuration file:

exclude:                    # project wide excludes
  - "tests"
test_library: pytest
frameworks: 
  - Django
  - Djangorestframework
modules:                     # configurations for specific modules/directories.
  directory1:
    depends_on:
      - directory2
      - directory3
    exclude:
      - "*.txt"
      - "tests.py"
  directory2:
    depends_on:
      - directory3
      - directory4
    exclude:
        - "templates"

The exclude entry lists files or directories to ignore for a module (or globally if it is a top level configuration).

Setting module dependencies appropriately can largely reduce LLM costs and context size leading to better performance. However, wrong dependency relationships can be detrimental.

When no configuration file is provided in the repository, the following configuration file is used instead:

exclude:
    - .git
    - __pycache__
    - tests
    - migrations
    - requirements
test_library: pytest
frameworks:
    - django
    - djangorestframework
language: python

Tips for Best Results

  1. Should the tests depend on some lesser known projects (eg. some private apps in separate repositories) it is best to manually write sample tests from which the LLM can learn how to generate specific fixtures and how those external dependencies are used.
  2. This tool works best for mono repos, where the LLM can understand the entire project scope at once.
  3. Provide an ibl_test_config.yaml file at the root directory of the repository for optimal performance. Ensure that all entries provided are correct. Also make sure that all dependency relationship specified are exhaustive.
  4. For very small projects, dependency relationships can be ignored in favor of loading the entire project as context.

Limitations

  1. In situations where the project depends on a lesser known package that may be essential to testing the project, some tests may end up being incorrect.
  2. Merging the code generated by the LLM may not be the best possible approach. It is essential to check out the test branch created, run the tests and issue fixes where necessary.