Image recognition and classification based on Convolutional Neural Networks to identify up to 8 classes of animals.
The complete dataset comes from the Google Quick, draw! Dataset. It is a collection of 50 million drawings across 345 categories, contributed by players of the game Quick, Draw!. The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located.
For the purpose of the Image Recognition project, I decided to focus on animals, and selected up to 8 classes. I extracted 10,000 images of each class, using the simplified drawing files that had been made available (.ndjson) on the Google Cloud Platform.. These simplified files are vectors without the timing information, positioned and scaled into a 256x256 region, that I resized into 28x28 (std. MNIST dataset sizes) using the function vector_to_raster by @HalfdanJ (googlecreativelab/quickdraw-dataset#19).
- A convolution layer with patches of size 5x5
- A Max pooling layer
- A convolution layer with patches of size 3x3
- A Max pooling layer of size 2x2
- A dropout layer with a rate of 40%
- A flatten layer
- A fully connected layer and rectifier activation function
- A fully connected layer and rectifier activation function
- An output layer (fully connected layer) with 8 classes and softmax activation function
convnet = models.Sequential()
convnet.add(layers.Conv2D(32, (5, 5), activation='relu', input_shape=(small_side, small_side, 1)))
convnet.add(layers.MaxPooling2D((2, 2)))
convnet.add(layers.Conv2D(128, (3, 3), activation='relu'))
convnet.add(layers.MaxPooling2D((2, 2)))
convnet.add(layers.Dropout(0.4))
convnet.add(layers.Flatten())
convnet.add(layers.Dense(128, activation='relu'))
convnet.add(layers.Dense(50, activation='relu'))
convnet.add(layers.Dense(nb_classes, activation='softmax'))
After training with a batch size of 128 and using 10 to 15 epochs for training both the training and test set of images:
Convolutional Neural Network: 95.83 %
Convolutional Neural Network: 87.83 %
Convolutional Neural Network: 88.45 %
The relatively low results of the model (especially for the 8 classes) can be explained by three main factors:
-
The quality bias: The quality of the drawings in the dataset is heterogeneous and the labels are sometimes unrecognizable. Drawing animals is indeed a complex task that requires a sense of the right proportions to make them distinguishable.
-
The species bias: Most classes of animals have similarities, which makes the distinction difficult: a body shape topped with a head, and 2 to 4 legs for most of them.
-
The cultural bias: All people around the globe, from different cultures, don't draw animals with the same lines or shapes. For example, cows don't necessarily have patches in all countries, but giraffes do, which can lead to errors of identification.