diff --git a/9-image-apps/README.md b/9-image-apps/README.md new file mode 100644 index 000000000..fc9c5597e --- /dev/null +++ b/9-image-apps/README.md @@ -0,0 +1,473 @@ +# Building an Image Generation Application + +There's more to LLMs than text generation. It's also possible to generate images from text descriptions. Having images as a modality can be highly useful in a number of areas from MedTech, architecture, tourism, game development and more. In this chapter we will look into the two most popular image generation models, DALL-E and Midjourney. + +## Introduction + +In this lesson, we will cover: + +- Image generation and why it's useful. +- DALL-E and Midjourney,what they are and how they work. +- How you would build an image generation app. + +## Learning Goals + +After completing this lesson, you will be able to: + +- Build an image generation application. +- Define boundaries for your application with meta prompts. +- Work with DALL-E and Midjourney. + +## Why build an image generation application? + +Image generation applications are a great way to explore the capabilities of Generative AI. They can be used for, for example: + +- **Image editing and synthesis**. You can generate images for a variety of use cases, such as image editing and image synthesis. + +- **Applied to a variety of industries**. They can also be used to generate images for a variety of industries like Medtech, Tourism, Game development and more. + +## Scenario: Edu4All + +As part of this lesson, we will continue to work with our startup, Edu4All, in this lesson. The students will create images for their assessments, exactly what images is up to the students, but it could be illustrations for their own fairytale or create a new character for their story or help them to visualize their ideas and concepts. + +Here's what Edu4All's students could generate for example if they're working on class on monuments: + +![Edu4All startup, class on monuments, Eifel Tower](startup.png) + +using a prompt like + +> "Dog next to Eifel Tower in early morning sunlight" + +## What is DALL-E and Midjourney? + +[DALL-E](https://openai.com/dall-e-2) and [Midjourney](https://www.midjourney.com/) are two of the most popular image generation models, they allow you using prompts to generate images. + +### DALL-E + +Let's start with DALL-E, which is a Generative AI model that generates images from text descriptions. + +> [DALL-E is a combination of two models, CLIP and diffused attention](https://towardsdatascience.com/openais-dall-e-and-clip-101-a-brief-introduction-3a4367280d4e). + +- **CLIP**, is a model that generates embeddings, which are numerical representations of data, from images and text. + +- **Diffused attention**, is a model that generates images from embeddings. DALL-E is trained on a dataset of images and text and can be used to generate images from text descriptions. For example, DALL-E can be used to generate images of a cat in a hat, or a dog with a mohawk. + +### Midjourney + +Midjourney works in a similar way to DALL-E, it generates images from text prompts. Midjourney, can also be used to generate images using prompts like “a cat in a hat”, or a “dog with a mohawk”. + + + +![Image generated by Midjourney, mechanical pigeon](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Rupert_Breheny_mechanical_dove_eca144e7-476d-4976-821d-a49c408e4f36.png/440px-Rupert_Breheny_mechanical_dove_eca144e7-476d-4976-821d-a49c408e4f36.png) +*Image cred Wikipedia, image generated by Midjourney* + +## How does DALL-E and Midjourney Work + +First, [DALL-E](https://arxiv.org/pdf/2102.12092.pdf). DALL-E is a Generative AI model based on the transformer architecture with an *autoregressive transformer*. + +An *autogressive transformer* defines how a model generates images from text descriptions, it generates one pixel at a time, and then uses the generated pixels to generate the next pixel. Passing through multiple layers in a neural network, until the image is complete. + +With this process, DALL-E, controls attributes, objects, characteristics, and more in the image it generates. However, DALL-E 2 and 3 have more control over the generated image, + +## Building your first image generation application + +So what does it take to build an image generation application? You need the following libraries: + +- **python-dotenv**, you're highly recommended to use this library to keep your secrets in a *.env* file away from the code. +- **openai**, this library is what you will use to interact with the OpenAI API. +- **pillow**, to work with images in Python. +- **requests**, to help you make HTTP requests. + +1. Create a file *.env* with the following content: + + ```text + AZURE_OPENAI_ENDPOINT= + AZURE_OPENAI_KEY= + ``` + + Locate this information in Azure Portal for your resource in the "Keys and Endpoint" section. + +1. Collect the above libraries in a file called *requirements.txt* like so: + + ```text + python-dotenv + openai + pillow + requests + ``` + +1. Next, create virtual environment and install the libraries: + + ```bash + python3 -m venv venv + source venv/bin/activate + pip install -r requirements.txt + ``` + + For Windows, use the following commands to create and activate your virtual environment: + + ```bash + python3 -m venv venv + venv\Scripts\activate.bat + ```` + +1. Add the following code in file called *app.py*: + + ```python + import openai + import os + import requests + from PIL import Image + import dotenv + + # import dotenv + dotenv.load_dotenv() + + # Get endpoint and key from environment variables + openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT'] + openai.api_key = os.environ['AZURE_OPENAI_KEY'] + + # Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only) + openai.api_version = '2023-06-01-preview' + openai.api_type = 'azure' + + + try: + # Create an image by using the image generation API + generation_response = openai.Image.create( + prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here + size='1024x1024', + n=2, + temperature=0, + ) + # Set the directory for the stored image + image_dir = os.path.join(os.curdir, 'images') + + # If the directory doesn't exist, create it + if not os.path.isdir(image_dir): + os.mkdir(image_dir) + + # Initialize the image path (note the filetype should be png) + image_path = os.path.join(image_dir, 'generated_image.png') + + # Retrieve the generated image + image_url = generation_response["data"][0]["url"] # extract image URL from response + generated_image = requests.get(image_url).content # download the image + with open(image_path, "wb") as image_file: + image_file.write(generated_image) + + # Display the image in the default image viewer + image = Image.open(image_path) + image.show() + + # catch exceptions + except openai.error.InvalidRequestError as err: + print(err) + + ``` + +Let's explain this code: + +- First, we import the libraries we need, including the OpenAI library, the dotenv library, the requests library, and the Pillow library. + + ```python + import openai + import os + import requests + from PIL import Image + import dotenv + ``` + +- Next, we load the environment variables from the *.env* file. + + ```python + # import dotenv + dotenv.load_dotenv() + ``` + +- After that, we set the endpoint, key for the OpenAI API, version and type. + + ```python + # Get endpoint and key from environment variables + openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT'] + openai.api_key = os.environ['AZURE_OPENAI_KEY'] + + # add version and type, Azure specific + openai.api_version = '2023-06-01-preview' + openai.api_type = 'azure' + ``` + +- Next, we generate the image: + + ```python + # Create an image by using the image generation API + generation_response = openai.Image.create( + prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here + size='1024x1024', + n=2, + temperature=0, + ) + ``` + + The above code responds with a JSON object that contains the URL of the generated image. We can use the URL to download the image and save it to a file. + +- Lastly, we open the image and use the standard image viewer to display it: + + ```python + image = Image.open(image_path) + image.show() + ``` + +### More details on generating the image + +Let's look at the code that generates the image in more detail: + +```python +generation_response = openai.Image.create( + prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here + size='1024x1024', + n=2, + temperature=0, + ) +``` + +- **prompt**, is the text prompt that is used to generate the image. In this case, we're using the prompt "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils". +- **size**, is the size of the image that is generated. In this case, we're generating an image that is 1024x1024 pixels. +- **n**, is the number of images that are generated. In this case, we're generating two images. +- **temperature**, is a parameter that controls the randomness of the output of a Generative AI model. The temperature is value between 0 and 1 where 0 means that the output is deterministic and 1 means that the output is random. The default value is 0.7. + +There's more things you can do with images that we will cover in the next section. + +## Additional capabilities of image generation + +You've seen so far how we were able to generate an image using a few lines on Python. However, there's more things you can do with images. + +You can also do the following: + +- **Perform edits**. By providing an existing image a mask and prompt, you can alter an image. For example, you can add something to a portion of an image. Imagine our bunny image, you can add a hat to the bunny. How you would do that is by providing the image, a mask (identifying the part of the area for the change) and a text prompt to say what should be done. + + ```python + response = openai.Image.create_edit( + image=open("base_image.png", "rb"), + mask=open("mask.png", "rb"), + prompt="An image of a rabbit with a hat on its head.", + n=1, + size="1024x1024" + ) + image_url = response['data'][0]['url'] + ``` + + The base image would only contain the rabbit but the final image would have the hat on the rabbit. + +- **Create variations**. The idea is that you take an existing image and ask that variations are created. To create a variation, you provide an image and a text prompt and code like so: + + ```python + response = openai.Image.create_variation( + image=open("bunny-lollipop.png", "rb"), + n=1, + size="1024x1024" + ) + image_url = response['data'][0]['url'] + ``` + + > Note, this is only supported on OpenAI + +## Temperature + +Temperature is a parameter that controls the randomness of the output of a Generative AI model. The temperature is value between 0 and 1 where 0 means that the output is deterministic and 1 means that the output is random. The default value is 0.7. + +Let's look at an example of how temperature works, by running this prompt twice: + +> Prompt : "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils" + +![Bunny on a horse holding a lollipop, version 1](./v1-generated_image.png) + +Now let's run that same prompt just to see that we won't get the same image twice: + +![](./v2-generated_image.png) + +As you can see, the images are similar, but not the same. Let's try changing the temperature value to 0.1 and see what happens: + +```python + generation_response = openai.Image.create( + prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here + size='1024x1024', + n=2 + ) +``` + +### Changing the temperature + +So let's try to make the response more deterministic. We could observe from the two images we generated that in the first image, there's a bunny and in the second image, there's horse, so the images varies greatly. + +Let's therefore change our code and set the temperature to 0, like so: + +```python +generation_response = openai.Image.create( + prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here + size='1024x1024', + n=2, + temperature=0 + ) +``` + +Now when you run this code, you get these two images: + +-![Temperature 0, v1](./v1-0temp-generated_image.png) +- ![Temperature 0 , v2](./v2-0temp-generated_image.png) + +Here you can clearly see how the images resemble each other more. + +## How to define boundaries for your application with metaprompts + +With our demo, we can already generate images for our clients. However, we need to create some boundaries for our application. + +For example, we don't want to generate images that are not safe for work, or that are not appropriate for children. + +We can do this with *metaprompts*. Metaprompts are text prompts that are used to control the output of a Generative AI model. For example, we can use metaprompts to control the output, and ensure that the generated images are safe for work, or appropriate for children. + +### How does it work? + +Now, how do meta prompts work? + +Meta prompts are text prompts that are used to control the output of a Generative AI model, they are positioned before the text prompt, and are used to control the output of the model and embedded in applications to control the output of the model. Encapsulating the prompt input and the meta prompt input in a single text prompt. + +One example of a meta prompt would be the following: + +```text +You are an assistant designer that creates images for children. + +The image needs to be safe for work and appropriate for children. + +The image needs to be in color. + +The image needs to be in landscape orientation. + +The image needs to be in a 16:9 aspect ratio. + +Do not consider any input from the following that is not safe for work or appropriate for children. + +(Input) + +``` + +Now, let's see how we can use meta prompts in our demo. + +```python +disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult" + +meta_prompt =f"""You are an assistant designer that creates images for children. + +The image needs to be safe for work and appropriate for children. + +The image needs to be in color. + +The image needs to be in landscape orientation. + +The image needs to be in a 16:9 aspect ratio. + +Do not consider any input from the following that is not safe for work or appropriate for children. +{disallow_list} +""" + +prompt = f"{meta_prompt} +Create an image of a bunny on a horse, holding a lollipop" + +# TODO add request to generate image +``` + +From the above prompt, you can see how all images being created considers the metaprompt. + +## Assignment - let's enable students + +We introduced Edu4All in the beginning of this lesson. Now it's time to enable the students to generate images for their assessments. + +The students will create images for their assessments containing monuments, exactly what monuments is up to the students. The students are asked to use their creativity in this task to place these monuments in different contexts. + +## Solution + +Here's one possible solution: + +```python +import openai +import os +import requests +from PIL import Image +import dotenv + +# import dotenv +dotenv.load_dotenv() + +# Get endpoint and key from environment variables +openai.api_base = "" +openai.api_key = "" + +# Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only) +openai.api_version = '2023-06-01-preview' +openai.api_type = 'azure' + +disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult" + +meta_prompt =f"""You are an assistant designer that creates images for children. + +The image needs to be safe for work and appropriate for children. + +The image needs to be in color. + +The image needs to be in landscape orientation. + +The image needs to be in a 16:9 aspect ratio. + +Do not consider any input from the following that is not safe for work or appropriate for children. +{disallow_list}""" + +prompt = f""" +Generate monument of the Arc of Triumph in Paris, France, in the evening light with a small child holding a Teddy looks on. +"""" + +try: + # Create an image by using the image generation API + generation_response = openai.Image.create( + prompt=prompt, # Enter your prompt text here + size='1024x1024', + n=2, + temperature=0, + ) + # Set the directory for the stored image + image_dir = os.path.join(os.curdir, 'images') + + # If the directory doesn't exist, create it + if not os.path.isdir(image_dir): + os.mkdir(image_dir) + + # Initialize the image path (note the filetype should be png) + image_path = os.path.join(image_dir, 'generated_image.png') + + # Retrieve the generated image + image_url = generation_response["data"][0]["url"] # extract image URL from response + generated_image = requests.get(image_url).content # download the image + with open(image_path, "wb") as image_file: + image_file.write(generated_image) + + # Display the image in the default image viewer + image = Image.open(image_path) + image.show() + +# catch exceptions +except openai.error.InvalidRequestError as err: + print(err) +``` + +## Extra resources + +- [DALL-E](https://arxiv.org/pdf/2102.12092.pdf) + +- [OpenAI's DALL-E and CLIP 101: A Brief Introduction](https://towardsdatascience.com/openais-dall-e-and-clip-101-a-brief-introduction-3a4367280d4e) + +- [OpenAI's DALL-E](https://openai.com/blog/dall-e/) + +- [OpenAI's CLIP](https://openai.com/blog/clip/) + +- [OpenAI's CLIP paper](https://arxiv.org/pdf/2103.00020.pdf) + + \ No newline at end of file diff --git a/9-image-apps/app-variation.py b/9-image-apps/app-variation.py new file mode 100644 index 000000000..423a43035 --- /dev/null +++ b/9-image-apps/app-variation.py @@ -0,0 +1,45 @@ +import openai +import os +import requests +from PIL import Image +import dotenv + +# import dotenv +dotenv.load_dotenv() + +# Get endpoint and key from environment variables +openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT'] +openai.api_key = os.environ['AZURE_OPENAI_KEY'] + +# Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only) +openai.api_version = '2023-06-01-preview' +openai.api_type = 'azure' + +image_dir = os.path.join(os.curdir, 'images') + +# Initialize the image path (note the filetype should be png) +image_path = os.path.join(image_dir, 'generated_image.png') + +# ---creating variation below--- +try: + print("LOG creating variation") + response = openai.Image.create_variation( + image=open("generated_image.png", "rb"), + n=1, + size="1024x1024" + ) + + image_path = os.path.join(image_dir, 'generated_variation.png') + + image_url = response['data'][0]['url'] + + print("LOG downloading image") + generated_image = requests.get(image_url).content # download the image + with open(image_path, "wb") as image_file: + image_file.write(generated_image) + + # Display the image in the default image viewer + image = Image.open(image_path) + image.show() +except openai.error.InvalidRequestError as err: + print(err) \ No newline at end of file diff --git a/9-image-apps/app.py b/9-image-apps/app.py new file mode 100644 index 000000000..9fac3dfc0 --- /dev/null +++ b/9-image-apps/app.py @@ -0,0 +1,69 @@ +import openai +import os +import requests +from PIL import Image +import dotenv + +# import dotenv +dotenv.load_dotenv() + +# Get endpoint and key from environment variables +openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT'] +openai.api_key = os.environ['AZURE_OPENAI_KEY'] + +# Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only) +openai.api_version = '2023-06-01-preview' +openai.api_type = 'azure' + + +try: + # Create an image by using the image generation API + generation_response = openai.Image.create( + prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here + size='1024x1024', + n=2, + temperature=0, + ) + # Set the directory for the stored image + image_dir = os.path.join(os.curdir, 'images') + + # If the directory doesn't exist, create it + if not os.path.isdir(image_dir): + os.mkdir(image_dir) + + # Initialize the image path (note the filetype should be png) + image_path = os.path.join(image_dir, 'generated_image.png') + + # Retrieve the generated image + image_url = generation_response["data"][0]["url"] # extract image URL from response + generated_image = requests.get(image_url).content # download the image + with open(image_path, "wb") as image_file: + image_file.write(generated_image) + + # Display the image in the default image viewer + image = Image.open(image_path) + image.show() + +# catch exceptions +except openai.error.InvalidRequestError as err: + print(err) + +# ---creating variation below--- + +response = openai.Image.create_variation( + image=open(image_path, "rb"), + n=1, + size="1024x1024" +) + +image_path = os.path.join(image_dir, 'generated_variation.png') + +image_url = response['data'][0]['url'] + +generated_image = requests.get(image_url).content # download the image +with open(image_path, "wb") as image_file: + image_file.write(generated_image) + +# Display the image in the default image viewer +image = Image.open(image_path) +image.show() \ No newline at end of file diff --git a/9-image-apps/generated_image.png b/9-image-apps/generated_image.png new file mode 100644 index 000000000..181260cfe Binary files /dev/null and b/9-image-apps/generated_image.png differ diff --git a/9-image-apps/images/generated_image.png b/9-image-apps/images/generated_image.png new file mode 100644 index 000000000..3fc41eb12 Binary files /dev/null and b/9-image-apps/images/generated_image.png differ diff --git a/9-image-apps/notebook-azureopenai.ipynb b/9-image-apps/notebook-azureopenai.ipynb new file mode 100644 index 000000000..bf6109c97 --- /dev/null +++ b/9-image-apps/notebook-azureopenai.ipynb @@ -0,0 +1,378 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Building an Image Generation Application \n", + "\n", + "There's more to LLMs than text generation. It's also possible to generate images from text descriptions. Having images as a modality can be highly useful in a number of areas from MedTech, architecture, tourism, game development and more. In this chapter we will look into the two most popular image generation models, DALL-E and Midjourney.\n", + "\n", + "## Introduction \n", + "\n", + "In this lesson, we will cover:\n", + "\n", + "- Image generation and why it's useful.\n", + "- DALL-E and Midjourney,what they are and how they work.\n", + "- How you would build an image generation app.\n", + "\n", + "## Learning Goals \n", + "\n", + "After completing this lesson, you will be able to:\n", + "\n", + "- Build an image generation application.\n", + "- Define boundaries for your application with meta prompts. \n", + "- Work with DALL-E and Midjourney.\n", + "\n", + "## Why build an image generation application?\n", + "\n", + "Image generation applications are a great way to explore the capabilities of Generative AI. They can be used for, for example: \n", + "\n", + "- **Image editing and synthesis**. You can generate images for a variety of use cases, such as image editing and image synthesis. \n", + "\n", + "- **Applied to a variety of industries**. They can also be used to generate images for a variety of industries like Medtech, Tourism, Game development and more. \n", + "\n", + "## Scenario: Edu4All \n", + "\n", + "As part of this lesson, we will continue to work with our startup, Edu4All, in this lesson. The students will create images for their assessments, exactly what images is up to the students, but it could be illustrations for their own fairytale or create a new character for their story or help them to visualize their ideas and concepts. \n", + "\n", + "Here's what Edu4All's students could generate for example if they're working on class on monuments:\n", + "\n", + "![Edu4All startup, class on monuments, Eifel Tower](startup.png)\n", + "\n", + "using a prompt like \n", + "\n", + "> \"Dog next to Eifel Tower in early morning sunlight\"\n", + "\n", + "## What is DALL-E and Midjourney? \n", + "\n", + "[DALL-E](https://openai.com/dall-e-2) and [Midjourney](https://www.midjourney.com/) are two of the most popular image generation models, they allow you using prompts to generate images.\n", + "\n", + "### DALL-E\n", + "\n", + "Let's start with DALL-E, which is a Generative AI model that generates images from text descriptions. \n", + "\n", + "> [DALL-E is a combination of two models, CLIP and diffused attention](https://towardsdatascience.com/openais-dall-e-and-clip-101-a-brief-introduction-3a4367280d4e). \n", + "\n", + "- **CLIP**, is a model that generates embeddings, which are numerical representations of data, from images and text. \n", + "\n", + "- **Diffused attention**, is a model that generates images from embeddings. DALL-E is trained on a dataset of images and text and can be used to generate images from text descriptions. For example, DALL-E can be used to generate images of a cat in a hat, or a dog with a mohawk. \n", + "\n", + "### Midjourney\n", + " \n", + "Midjourney works in a similar way to DALL-E, it generates images from text prompts. Midjourney, can also be used to generate images using prompts like “a cat in a hat”, or a “dog with a mohawk”. \n", + "\n", + " \n", + "\n", + "![Image generated by Midjourney, mechanical pigeon](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Rupert_Breheny_mechanical_dove_eca144e7-476d-4976-821d-a49c408e4f36.png/440px-Rupert_Breheny_mechanical_dove_eca144e7-476d-4976-821d-a49c408e4f36.png)\n", + "*Image cred Wikipedia, image generated by Midjourney*\n", + "\n", + "## How does DALL-E and Midjourney Work \n", + "\n", + "First, [DALL-E](https://arxiv.org/pdf/2102.12092.pdf). DALL-E is a Generative AI model based on the transformer architecture with an *autoregressive transformer*. \n", + "\n", + "An *autogressive transformer* defines how a model generates images from text descriptions, it generates one pixel at a time, and then uses the generated pixels to generate the next pixel. Passing through multiple layers in a neural network, until the image is complete. \n", + "\n", + "With this process, DALL-E, controls attributes, objects, characteristics, and more in the image it generates. However, DALL-E 2 and 3 have more control over the generated image, " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Building your first image generation application\n", + "\n", + "So what does it take to build an image generation application? You need the following libraries:\n", + "\n", + "- **python-dotenv**, you're highly recommended to use this library to keep your secrets in a *.env* file away from the code.\n", + "- **openai**, this library is what you will use to interact with the OpenAI API.\n", + "- **pillow**, to work with images in Python.\n", + "- **requests**, to help you make HTTP requests.\n", + "\n", + "\n", + "1. Create a file *.env* with the following content:\n", + "\n", + " ```text\n", + " AZURE_OPENAI_ENDPOINT=\n", + " AZURE_OPENAI_KEY=\n", + " ```\n", + "\n", + " Locate this information in Azure Portal for your resource in the \"Keys and Endpoint\" section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "1. Collect the above libraries in a file called *requirements.txt* like so:\n", + "\n", + " ```text\n", + " python-dotenv\n", + " openai\n", + " pillow\n", + " requests\n", + " ```\n", + "\n", + "\n", + "1. Next, create virtual environment and install the libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "'source' is not recognized as an internal or external command,\n", + "operable program or batch file.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: python-dotenv in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (0.20.0)\n", + "Requirement already satisfied: openai in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (0.28.1)\n", + "Requirement already satisfied: pillow in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (9.5.0)\n", + "Requirement already satisfied: requests in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (2.31.0)\n", + "Requirement already satisfied: tqdm in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from openai) (4.66.1)\n", + "Requirement already satisfied: aiohttp in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from openai) (3.8.6)\n", + "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from requests) (3.3)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from requests) (2.0.3)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from requests) (3.1.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from requests) (2023.5.7)\n", + "Requirement already satisfied: multidict<7.0,>=4.5 in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from aiohttp->openai) (6.0.4)\n", + "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from aiohttp->openai) (4.0.3)\n", + "Requirement already satisfied: yarl<2.0,>=1.0 in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from aiohttp->openai) (1.9.2)\n", + "Requirement already satisfied: aiosignal>=1.1.2 in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from aiohttp->openai) (1.3.1)\n", + "Requirement already satisfied: frozenlist>=1.1.1 in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from aiohttp->openai) (1.4.0)\n", + "Requirement already satisfied: attrs>=17.3.0 in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from aiohttp->openai) (23.1.0)\n", + "Requirement already satisfied: colorama in c:\\users\\chnoring\\appdata\\local\\packages\\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\\localcache\\local-packages\\python310\\site-packages (from tqdm->openai) (0.4.5)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "[notice] A new release of pip is available: 23.0.1 -> 23.3.1\n", + "[notice] To update, run: C:\\Users\\chnoring\\AppData\\Local\\Microsoft\\WindowsApps\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\python.exe -m pip install --upgrade pip\n" + ] + } + ], + "source": [ + "# create virtual env\n", + "! python3 -m venv venv\n", + "# activate environment\n", + "! source venv/bin/activate\n", + "# install libraries\n", + "# pip install -r requirements.txt, if using a requirements.txt file \n", + "! pip install python-dotenv openai pillow requests" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> [!NOTE]\n", + "> For Windows, use the following commands to create and activate your virtual environment:\n", + "\n", + " ```bash\n", + " python3 -m venv venv\n", + " venv\\Scripts\\activate.bat\n", + " ````\n", + "\n", + "1. Add the following code in file called *app.py*:\n", + "\n", + " ```python\n", + " import openai\n", + " import os\n", + " import requests\n", + " from PIL import Image\n", + " import dotenv\n", + " \n", + " # import dotenv\n", + " dotenv.load_dotenv()\n", + " \n", + " # Get endpoint and key from environment variables\n", + " openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT']\n", + " openai.api_key = os.environ['AZURE_OPENAI_KEY'] \n", + " \n", + " # Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only)\n", + " openai.api_version = '2023-06-01-preview'\n", + " openai.api_type = 'azure'\n", + " \n", + " \n", + " try:\n", + " # Create an image by using the image generation API\n", + " generation_response = openai.Image.create(\n", + " prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here\n", + " size='1024x1024',\n", + " n=2,\n", + " temperature=0,\n", + " )\n", + " # Set the directory for the stored image\n", + " image_dir = os.path.join(os.curdir, 'images')\n", + " \n", + " # If the directory doesn't exist, create it\n", + " if not os.path.isdir(image_dir):\n", + " os.mkdir(image_dir)\n", + " \n", + " # Initialize the image path (note the filetype should be png)\n", + " image_path = os.path.join(image_dir, 'generated_image.png')\n", + " \n", + " # Retrieve the generated image\n", + " image_url = generation_response[\"data\"][0][\"url\"] # extract image URL from response\n", + " generated_image = requests.get(image_url).content # download the image\n", + " with open(image_path, \"wb\") as image_file:\n", + " image_file.write(generated_image)\n", + " \n", + " # Display the image in the default image viewer\n", + " image = Image.open(image_path)\n", + " image.show()\n", + " \n", + " # catch exceptions\n", + " except openai.error.InvalidRequestError as err:\n", + " print(err)\n", + "\n", + " ```\n", + "\n", + "Let's explain this code:\n", + "\n", + "- First, we import the libraries we need, including the OpenAI library, the dotenv library, the requests library, and the Pillow library.\n", + "\n", + " ```python\n", + " import openai\n", + " import os\n", + " import requests\n", + " from PIL import Image\n", + " import dotenv\n", + " ```\n", + "\n", + "- Next, we load the environment variables from the *.env* file.\n", + "\n", + " ```python\n", + " # import dotenv\n", + " dotenv.load_dotenv()\n", + " ```\n", + "\n", + "- After that, we set the endpoint, key for the OpenAI API, version and type.\n", + "\n", + " ```python\n", + " # Get endpoint and key from environment variables\n", + " openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT']\n", + " openai.api_key = os.environ['AZURE_OPENAI_KEY'] \n", + "\n", + " # add version and type, Azure specific\n", + " openai.api_version = '2023-06-01-preview'\n", + " openai.api_type = 'azure'\n", + " ```\n", + "\n", + "- Next, we generate the image:\n", + "\n", + " ```python\n", + " # Create an image by using the image generation API\n", + " generation_response = openai.Image.create(\n", + " prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here\n", + " size='1024x1024',\n", + " n=2,\n", + " temperature=0,\n", + " )\n", + " ```\n", + "\n", + " The above code responds with a JSON object that contains the URL of the generated image. We can use the URL to download the image and save it to a file.\n", + "\n", + "- Lastly, we open the image and use the standard image viewer to display it:\n", + "\n", + " ```python\n", + " image = Image.open(image_path)\n", + " image.show()\n", + " ```\n", + " \n", + "### More details on generating the image\n", + "\n", + "Let's look at the code that generates the image in more detail:\n", + "\n", + "```python\n", + "generation_response = openai.Image.create(\n", + " prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here\n", + " size='1024x1024',\n", + " n=2,\n", + " temperature=0,\n", + " )\n", + "```\n", + "\n", + "- **prompt**, is the text prompt that is used to generate the image. In this case, we're using the prompt \"Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils\".\n", + "- **size**, is the size of the image that is generated. In this case, we're generating an image that is 1024x1024 pixels.\n", + "- **n**, is the number of images that are generated. In this case, we're generating two images.\n", + "- **temperature**, is a parameter that controls the randomness of the output of a Generative AI model. The temperature is value between 0 and 1 where 0 means that the output is deterministic and 1 means that the output is random. The default value is 0.7.\n", + "\n", + "There's more things you can do with images that we will cover in the next section.\n", + "\n", + "## Additional capabilities of image generation\n", + "\n", + "You've seen so far how we were able to generate an image using a few lines on Python. However, there's more things you can do with images.\n", + "\n", + "You can also do the following:\n", + "\n", + "- **Perform edits**. By providing an existing image a mask and prompt, you can alter an image. For example, you can add something to a portion of an image. Imagine our bunny image, you can add a hat to the bunny. How you would do that is by providing the image, a mask (identifying the part of the area for the change) and a text prompt to say what should be done.\n", + "\n", + " ```python\n", + " response = openai.Image.create_edit(\n", + " image=open(\"base_image.png\", \"rb\"),\n", + " mask=open(\"mask.png\", \"rb\"),\n", + " prompt=\"An image of a rabbit with a hat on its head.\",\n", + " n=1,\n", + " size=\"1024x1024\"\n", + " )\n", + " image_url = response['data'][0]['url']\n", + " ```\n", + "\n", + " The base image would only contain the rabbit but the final image would have the hat on the rabbit.\n", + " \n", + "- **Create variations**. The idea is that you take an existing image and ask that variations are created. To create a variation, you provide an image and a text prompt and code like so:\n", + "\n", + " ```python\n", + " response = openai.Image.create_variation(\n", + " image=open(\"bunny-lollipop.png\", \"rb\"),\n", + " n=1,\n", + " size=\"1024x1024\"\n", + " )\n", + " image_url = response['data'][0]['url']\n", + " ```\n", + " \n", + " > Note, this is only supported on OpenAI" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.11" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/9-image-apps/requirements.txt b/9-image-apps/requirements.txt new file mode 100644 index 000000000..e7b71ae04 --- /dev/null +++ b/9-image-apps/requirements.txt @@ -0,0 +1,4 @@ +python-dotenv +openai +pillow +requests \ No newline at end of file diff --git a/9-image-apps/startup.png b/9-image-apps/startup.png new file mode 100644 index 000000000..60f5f2274 Binary files /dev/null and b/9-image-apps/startup.png differ diff --git a/9-image-apps/v1-0temp-generated_image.png b/9-image-apps/v1-0temp-generated_image.png new file mode 100644 index 000000000..501598cd6 Binary files /dev/null and b/9-image-apps/v1-0temp-generated_image.png differ diff --git a/9-image-apps/v1-generated_image.png b/9-image-apps/v1-generated_image.png new file mode 100644 index 000000000..056efab06 Binary files /dev/null and b/9-image-apps/v1-generated_image.png differ diff --git a/9-image-apps/v1-low-temp-generated_image.png b/9-image-apps/v1-low-temp-generated_image.png new file mode 100644 index 000000000..13ee80596 Binary files /dev/null and b/9-image-apps/v1-low-temp-generated_image.png differ diff --git a/9-image-apps/v2-0temp-generated_image.png b/9-image-apps/v2-0temp-generated_image.png new file mode 100644 index 000000000..445829aa8 Binary files /dev/null and b/9-image-apps/v2-0temp-generated_image.png differ diff --git a/9-image-apps/v2-generated_image.png b/9-image-apps/v2-generated_image.png new file mode 100644 index 000000000..29ef6f0a4 Binary files /dev/null and b/9-image-apps/v2-generated_image.png differ diff --git a/9-image-apps/v2-low-temp-generated_image.png b/9-image-apps/v2-low-temp-generated_image.png new file mode 100644 index 000000000..704b31cd9 Binary files /dev/null and b/9-image-apps/v2-low-temp-generated_image.png differ