This project aims to show that DALLE-3's image output can be vastly improved. The end goal is to do so using mathematical, AI-based, and human-feedback metrics.
It requires some setup to get VMAF working with FFmpeg so that quality metrics can be used to gauge image output variation based on optimizations supplied via this repository.
Through unit testing, the goal is to prove that DALLE-3 is not fully optimized, and can indeed be super-charged, backed by 11 different statistical metrics and, also, AI-determined quality increases as well.
- Detailed metrics with standardization to help assess relative image quality
- Detailed instructions on how to run your own evals.
- Premade prompt enhancers that can be used with ChatGPT if you lack an API key or API gpt-4 access.
- Premade prompt enhancers that utilize gpt-3.5-turbo-16k if cost is a limiting factor.
- To run your own prompt enhancer, follow the guide below, then simply execute `[pythondistro] src/easy_prompt_enhancer/prompt_enhancer.py "Make a pretty cat"`` or with whatever base prompt you desire.
- Python 3.9.x (invoke using
pyenv
if you have multiple versions of Python. Even more details on how to set that up later). - Download using
[system-package-manager] install pyenv
- Run
pyenv local 3.9.x
or justpyenv local 3.9
. It may take some time to install new Python versions. - Setup a
venv
(install using, Python 3.9.x,python -m install venv
) - Check your python version by running
python --version
- If it reads
3.9.x
, you're good to go. - Install the requirements using
python -m pip install -r requirements.txt
- You should now be able to run main.py, as well as generate your own prompts as mentioned above.
This program aims to take a base prompt, extract the output, and save it. Then, the prompt is optimized via various possibilities (mostly through gpt-4-0314 model optimizations via the API) and the resultant, new, "improved" images are saved as well.
This project requires only a few dependencies, but setting up FFmpeg with VMAF support can be tricky. It requires a few steps.
First and foremost, create a venv
using your preferered Python version from the root directory, source it, and then install dependencies using [python-distro] -m pip install -r requirements.txt
-
Deactivate your current venv by running
deactivate
if it is still active (as indicated in your terminal with(venv)
displayed. If the command is not recognized; your venv is deactivated or you have not installed venv properly.) -
Run
git clone https://github.com/Netflix/vmaf.git
to a directory of your choosing - preferableay within your project directory, perhaps within a venv. Wherever it is accessible - rememember the path you installed it to. -
Navigate to your vmaf directory, and
cd
into it. -
cd into
libvmaf
and then buildlibvmaf
, which will be explained below. -
Install pyenv via [system-package-manager] install pyenv
-
run
pyenv install 3.7.17
(minimum to get libvmaf working) -
run
pyenv local 3.7.17
-
Since you're in the libvmaf directory, create a venv (
python --version
first to check that you're using 3.7.17) -
Once confirming the version, run
python -m venv venv
-
Run
source venv/bin/activate
-
Run
python -m pip install meson
-
Run [package-manager] install nasm ninja doxygen
-
Run
meson build --buildtype release
-
Run
ninja -vC build
-
Run
ninja -vC build test
to see if all 13 tests are passed (as of 2023-10-26) -
Once tests are passed (there may be build issues that arise), proceed to the next step.
-
You will have to build FFmpeg yourself. Run the following command at your desired directory:
git clone https://github.com/FFmpeg/FFmpeg.git
-
Navigate to the directory where you cloned FFmpeg and run the following commands sequentially:
./configure --enable-libvmaf make -j4 make install
-
If everything goes well, after a build time of approximately 5-10 minutes on a decent computer, you should have VMAF support enabled.
-
To test this, run the following command, replacing
[path-to-ffmpeg-binary]
with the actual path to your FFmpeg binary:[path-to-ffmpeg-binary] -filters | grep vmaf
You should see something like
libvmaf
in the output, likely near the bottom. -
For macOS users, especially if using Homebrew, the typical path might be:
/usr/local/bin/FFmpeg/ffmpeg -filters | grep vmaf
-
If you encounter permissions errors, navigate to the location of your FFmpeg binary (regardless of its location) and run:
chmod -R +x $(pwd)
-
Once VMAF support is verified, use the following parameters in your program:
--ffmpeg-location="/usr/local/bin/FFmpeg/ffmpeg"
-
Remember to adjust the paths based on your system!
Now, please enjoy running the evaluations and observing any quality increases, and feel free to provide compelling datasets and share them with the community. To contribute, if you want to be a collaborator, contact me directly at [email protected] with "DALLE 3 EVALUATION REPO" as the title for a faster response.
TBA:
- Better chat framework
- More customizability
- Incorporation of EasyGPT-3.5?
- Easier system for automatically generating images, grabbing them, and arranging them so they may be tested.
- Streamlined unit testing support.
- Add more metrics.
- Fine-tune the standardized formula.
- Incorporate human-sampled feedback.
- Use iterative prompting techniques.
- Explore cheaper models and compare results.
- Modularize better!