A toy project to illustrate the use of google images to download a dataset of tomato & not tomato. Use Fast.AI for preprocessing.
...
In this notebook we are going to download a list of images we previously scraped on google images.
The urls of the images are stored in a .txt file, separated by '\n' characters.
You can learn how to do this by following this notebook here :
Or by following this article in French there:
from pathlib import Path
import os
# Get current directory
p = Path('.')
# Provide path to the .txt files with the url inside
tomato_urls_txt = p/"tomato_urls.txt"
pepper_urls_txt = p/"pepper_urls.txt"
# Create the directories to store the images ..
# .. we split images into two folders
# .. because there is two classes.
os.makedirs(p/"tomatoes", exist_ok=True)
os.makedirs(p/"peppers", exist_ok=True)
# Import of the helper function which will help us download the images.
from fastai.vision import download_images
print(" - Downloading Tomato Images - ")
download_images(urls=p/"tomato_urls.txt",
dest=p/'tomatoes')
print(" - Downloading Pepper Images - ")
download_images(urls=p/"pepper_urls.txt",
dest=p/'peppers')
We want images we can read and are in RGB format, so we verify each image one by one and discard the unsuitable ones.
from fastai.vision import verify_images
classes = ['tomatoes', 'peppers']
for c in classes:
print(c)
path_to_class_folder = p/c
# verify images have correct properties for training
verify_images(path_to_class_folder,
delete=True, img_format=f'{c} %d')
Now, you are ready to do whatever you want with it.
Like preprocessing them and feeding them in a neural network like done here: