Skip to content

Commit

Permalink
add mkdocs
Browse files Browse the repository at this point in the history
  • Loading branch information
rchan26 committed Jun 28, 2024
1 parent c4687ca commit 2babbcc
Show file tree
Hide file tree
Showing 39 changed files with 480 additions and 343 deletions.
52 changes: 26 additions & 26 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,29 +58,29 @@ jobs:
- name: Upload coverage report
uses: codecov/[email protected]

# dist:
# name: Distribution build
# runs-on: ubuntu-latest
# needs: [pre-commit]

# steps:
# - uses: actions/checkout@v4
# with:
# fetch-depth: 0

# - name: Build sdist and wheel
# run: pipx run build

# - uses: actions/upload-artifact@v4
# with:
# path: dist

# - name: Check products
# run: pipx run twine check dist/*

# - uses: pypa/[email protected]
# if: github.event_name == 'release' && github.event.action == 'published'
# with:
# # Remember to generate this and set it in "GitHub Secrets"
# user: __token__
# password: ${{ secrets.PYPI_API_TOKEN }}
docs:
needs: [pre-commit, pytest]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- uses: actions/setup-python@v4
with:
python-version: '3.11'

- name: Apply mkdocs cache
uses: actions/cache@v3
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- name: Install doc dependencies via poetry
run: |
pip install poetry
poetry install --with dev
- name: Build docs with gh-deploy --force
run: |
poetry run mkdocs gh-deploy --force
45 changes: 22 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,22 @@
# prompto

[![Actions Status][actions-badge]][actions-link]
[![Codecov Status][codecov-badge]][codecov-link]
[![PyPI version][pypi-version]][pypi-link]
[![PyPI platforms][pypi-platforms]][pypi-link]

`prompto` derives from the Italian word "_pronto_" which means "_ready_" and could also mean "_I prompt_" in Italian (if "_promptare_" was a verb meaning "_to prompt_").

`prompto` is a Python library facilitates of LLM experiments stored as jsonl files. It automates querying API endpoints and logs progress asynchronously. The library is designed to be extensible and can be used to query different models.

## Available APIs and Models

The library supports querying several APIs and models. The following APIs are currently supported are:
- [OpenAI](docs/models.md#openai) (`"openai"`)
- [Azure OpenAI](docs/models.md#azure-openai) (`"azure-openai"`)
- [Gemini](docs/models.md#gemini) (`"gemini"`)
- [Vertex AI](docs/models.md#vertex-ai) (`"vertexai"`)
- [Huggingface text-generation-inference](docs/models.md#huggingface-text-generation-inference) (`"huggingface-tgi"`)
- [Ollama](docs/models.md#ollama) (`"ollama"`)
- [A simple Quart API](docs/models.md#quart-api) for running models from [`transformers`](https://github.com/huggingface/transformers) locally (`"quart"`)

Our aim for `prompto` is to support more APIs and models in the future and to make it easy to add new APIs and models to the library. We welcome contributions to add new APIs and models to the library. We have a [contribution guide](docs/contribution.md) and a [guide on how to add new APIs and models](docs/add_new_api.md) to the library in the [docs](docs/).
* [OpenAI](./docs/openai.md) (`"openai"`)
* [Azure OpenAI](./docs/azure_openai.md) (`"azure-openai"`)
* [Gemini](./docs/gemini.md) (`"gemini"`)
* [Vertex AI](./docs/vertexai.md) (`"vertexai"`)
* [Huggingface text-generation-inference](./docs/huggingface_tgi.md) (`"huggingface-tgi"`)
* [Ollama](./docs/ollama.md) (`"ollama"`)
* [A simple Quart API](./docs/quart.md) for running models from [`transformers`](https://github.com/huggingface/transformers) locally (`"quart"`)

Our aim for `prompto` is to support more APIs and models in the future and to make it easy to add new APIs and models to the library. We welcome contributions to add new APIs and models to the library. We have a [contribution guide](docs/contribution.md) and a [guide on how to add new APIs and models](./docs/add_new_api.md) to the library in the [docs](./docs/README.md).

## Installation

Expand All @@ -44,9 +40,10 @@ You might also want to set up a development environment for the library. To do t
## Getting Started

The library has functionality to process experiments and to run a pipeline which continually looks for new experiment jsonl files in the input folder. Everything starts with defining a **pipeline data folder** which contains:
- `input` folder: contains the jsonl files with the experiments
- `output` folder: where the results of the experiments will be stored. When an experiment is ran, a folder is created within the output folder of the experiment name (as defined in the jsonl file but removing the `.jsonl` extension) and the results and logs for the experiment are stored there
- `media` folder: which contains the media files for the experiments. These files must be within folders of the same experiment name (as defined in the jsonl file but removing the `.jsonl` extension)

* `input` folder: contains the jsonl files with the experiments
* `output` folder: where the results of the experiments will be stored. When an experiment is ran, a folder is created within the output folder of the experiment name (as defined in the jsonl file but removing the `.jsonl` extension) and the results and logs for the experiment are stored there
* `media` folder: which contains the media files for the experiments. These files must be within folders of the same experiment name (as defined in the jsonl file but removing the `.jsonl` extension)

When using the library, you simply pass in the folder you would like to use as the pipeline data folder and the library will take care of the rest.

Expand Down Expand Up @@ -82,10 +79,11 @@ prompto_run_experiment --file data/input/openai.jsonl --max-queries 30
```

This will:

1. Create subfolders in the `data` folder (in particular, it will create `media` (`data/media`) and `output` (`data/media`) folders)
2. Create a folder in the the `output` folder with the name of the experiment (the file name without the `.jsonl` extention - in this case, `openai`)
2. Create a folder in the the `output` folder with the name of the experiment (the file name without the `.jsonl` extention * in this case, `openai`)
3. Move the `openai.jsonl` file to the `output/openai` folder (and add a timestamp of when the input file was created to that file)
4. Start running the experiment and sending requests to the OpenAI API asynchronously which we specified in this command to be 30 queries a minute (so requests are sent every 2 seconds) - the default is 10 queries per minute
4. Start running the experiment and sending requests to the OpenAI API asynchronously which we specified in this command to be 30 queries a minute (so requests are sent every 2 seconds) * the default is 10 queries per minute
5. Results will be stored in a "completed" jsonl file in the output folder (which is also timestamped)
6. Logs will be printed out to the console and also stored in a log file (which is also timestamped)

Expand Down Expand Up @@ -146,9 +144,10 @@ The completed experiment file will contain the responses from the Gemini API for
## Using the Library in Python

The library has a few key classes:
- [`Settings`](src/prompto/settings.py): this defines the settings of the the experiment pipeline which stores the paths to the relevant data folders and the parameters for the pipeline.
- [`Experiment`](src/prompto/experiment.py): this defines all the variables related to a _single_ experiment. An 'experiment' here is defined by a particular JSONL file which contains the data/prompts for each experiment. Each line in this folder is a particular input to the LLM which we will obtain a response for. An experiment can be processed by calling the `Experiment.process()` method which will query the model and store the results in the output folder.
- [`ExperimentPipeline`](src/prompto/experiment_pipeline.py): this is the main class for running the full pipeline. The pipeline can be ran using the `ExperimentPipeline.run()` method which will continually check the input folder for new experiments to process.
- [`AsyncBaseAPI`](src/prompto/base.py): this is the base class for querying all APIs. Each API/model should inherit from this class and implement the `async_query` method which will (asynchronously) query the model's API and return the response. When running an experiment, the `Experiment` class will call this method for each experiment to send requests asynchronously.

When a new model is added, you must add it to the [`API`](src/prompto/apis/__init__.py) dictionary which is in the `apis` module. This dictionary should map the model name to the class of the model.
* [`Settings`](./src/prompto/settings.py): this defines the settings of the the experiment pipeline which stores the paths to the relevant data folders and the parameters for the pipeline.
* [`Experiment`](./src/prompto/experiment.py): this defines all the variables related to a _single_ experiment. An 'experiment' here is defined by a particular JSONL file which contains the data/prompts for each experiment. Each line in this folder is a particular input to the LLM which we will obtain a response for. An experiment can be processed by calling the `Experiment.process()` method which will query the model and store the results in the output folder.
* [`ExperimentPipeline`](./src/prompto/experiment_pipeline.py): this is the main class for running the full pipeline. The pipeline can be ran using the `ExperimentPipeline.run()` method which will continually check the input folder for new experiments to process.
* [`AsyncBaseAPI`](./src/prompto/apis/base.py): this is the base class for querying all APIs. Each API/model should inherit from this class and implement the `async_query` method which will (asynchronously) query the model's API and return the response. When running an experiment, the `Experiment` class will call this method for each experiment to send requests asynchronously.

When a new model is added, you must add it to the [`API`](./src/prompto/apis/__init__.py) dictionary which is in the `apis` module. This dictionary should map the model name to the class of the model.
7 changes: 4 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,14 @@

## Getting Started

* [Quickstart](../README.md#getting-started)
* [Installation](../README.md#installation)
* [Examples](../examples)
* [Quickstart](./../README.md#getting-started)
* [Installation](./../README.md#installation)
* [Examples](./../examples/README.md)

## Using `prompto`

* [Setting up an experiment file](./experiment_file.md)
* [Configuring environment variables](./environment_variables.md)
* [prompto Pipeline and running experiments](./pipeline.md)
* [prompto commands](./commands.md)
* [Specifying rate limits](./rate_limits.md)
Expand Down
5 changes: 5 additions & 0 deletions docs/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# About

`prompto` is a Python library written by the [Research Engineering Team (REG)](https://www.turing.ac.uk/work-turing/research/research-engineering-group) at the [Alan Turing Institute](https://www.turing.ac.uk/). It was originally written by [Ryan Chan](https://github.com/rchan26), [Federico Nanni](https://github.com/fedenanni) and [Evelina Gabasova](https://github.com/evelinag).

The library is designed to facilitate the running of language model experiments stored as jsonl files. It automates querying API endpoints and logs progress asynchronously. The library is designed to be extensible and can be used to query different models.
14 changes: 14 additions & 0 deletions docs/add_new_api.md
Original file line number Diff line number Diff line change
@@ -1 +1,15 @@
# Instructions to add new API/model

The `prompto` library supports querying multiple LLM API endpoints asynchronously (see [available APIs](./../README.md#available-apis-and-models) and the [model docs](./models.md)). However, the list of available APIs is far from complete! As we don't have access to every API available, we need your help to implement them and welcome contributions to the library! It might also be the case that an API has been implemented, but perhaps it needs to updated or improved.

In this document, we aim to capture some key steps to add a new API/model to the library. We hope that this will develop into a helpful guide.

For a guide to contributing to the library in general, see our [contribution guide](./contribution.md). If you have any suggestions or corrections, please feel free to contribute!

## The `prompto` library structure

## Asynchronous querying

## The `AsyncBaseAPI` class

## Implementing 'checks'
22 changes: 22 additions & 0 deletions docs/azure_openai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## Azure OpenAI

**Environment variables**:

* `AZURE_OPENAI_API_KEY`: the API key for the Azure OpenAI API
* `AZURE_OPENAI_API_ENDPOINT`: the endpoint for the Azure OpenAI API
* `AZURE_OPENAI_API_VERSION`: the version of the Azure OpenAI API

**Model-specific environment variables**:

As described in the [model-specific environment variables](./environment_variables.md#model-specific-environment-variables) section, you can set model-specific environment variables for different models in Azure OpenAI by appending the model name to the environment variable name. For example, if `"model_name": "prompto_model"` is specified in the `prompt_dict`, the following model-specific environment variables can be used:

* `AZURE_OPENAI_API_KEY_prompto_model`
* `AZURE_OPENAI_API_ENDPOINT_prompto_model`
* `AZURE_OPENAI_API_VERSION_prompto_model`

**Required environment variables**:

For any given `prompt_dict`, the following environment variables are required:

* One of `AZURE_OPENAI_API_KEY` or `AZURE_OPENAI_API_KEY_model_name`
* One of `AZURE_OPENAI_API_ENDPOINT` or `AZURE_OPENAI_API_ENDPOINT_model_name`
23 changes: 12 additions & 11 deletions docs/commands.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Commands

- [Running an experiment file](#running-an-experiment-file)
- [Running the pipeline](#running-the-pipeline)
- [Run checks on an experiment file](#run-checks-on-an-experiment-file)
- [Create judge file](#create-judge-file)
- [Obtain missing results jsonl file](#obtain-missing-results-jsonl-file)
- [Convert images to correct form](#convert-images-to-correct-form)
- [Start up Quart server](#start-up-quart-server)
* [Running an experiment file](#running-an-experiment-file)
* [Running the pipeline](#running-the-pipeline)
* [Run checks on an experiment file](#run-checks-on-an-experiment-file)
* [Create judge file](#create-judge-file)
* [Obtain missing results jsonl file](#obtain-missing-results-jsonl-file)
* [Convert images to correct form](#convert-images-to-correct-form)
* [Start up Quart server](#start-up-quart-server)

## Running an experiment file

Expand Down Expand Up @@ -73,10 +73,11 @@ prompto_create_judge \
```

In `judge`, you must have two files:
- `template.txt`: this is the template file which contains the prompts and the responses to be scored. The responses should be replaced with the placeholders `{INPUT_PROMPT}` and `{OUTPUT_RESPONSE}`.
- `settings.json`: this is the settings json file which contains the settings for the judge(s). The keys are judge identifiers and the values are the "api", "model_name", "parameters" to specify the LLM to use as a judge (see the [experiment file documentation](experiment_file.md) for more details on these keys).

See for example [this judge example](../examples/data/data/judge) which contains example template and settings files.
* `template.txt`: this is the template file which contains the prompts and the responses to be scored. The responses should be replaced with the placeholders `{INPUT_PROMPT}` and `{OUTPUT_RESPONSE}`.
* `settings.json`: this is the settings json file which contains the settings for the judge(s). The keys are judge identifiers and the values are the "api", "model_name", "parameters" to specify the LLM to use as a judge (see the [experiment file documentation](experiment_file.md) for more details on these keys).

See for example [this judge example](./../examples/data/data/judge) which contains example template and settings files.

The judge specified with the `--judge` flag should be a key in the `settings.json` file in the judge location. You can create different judge files using different LLMs as judge by specifying a different judge identifier from the keys in the `settings.json` file.

Expand All @@ -103,7 +104,7 @@ prompto_convert_images --folder images

## Start up Quart server

As described in the [Quart API model documentation](models.md#quart-api), we have implemented a simple [Quart API](../src/prompto/apis/quart/quart_api.py) that can be used to quary a text-generation model from the [Huggingface model hub](https://huggingface.co/models) using the Huggingface `transformers` library. To start up the Quart server, you can use the `prompto_start_quart_server` command along with the Huggingface model name. To see all arguments of this command, run `prompto_start_quart_server --help`.
As described in the [Quart API model documentation](./quart.md), we have implemented a simple [Quart API](./../src/prompto/apis/quart/quart_api.py) that can be used to quary a text-generation model from the [Huggingface model hub](https://huggingface.co/models) using the Huggingface `transformers` library. To start up the Quart server, you can use the `prompto_start_quart_server` command along with the Huggingface model name. To see all arguments of this command, run `prompto_start_quart_server --help`.

To start up the Quart server with [`vicgalle/gpt2-open-instruct-v1`](https://huggingface.co/vicgalle/gpt2-open-instruct-v1), at `"http://localhost:8000"`, you can use the following command:
```
Expand Down
Loading

0 comments on commit 2babbcc

Please sign in to comment.