By studying machine learning concepts and architectures, I developed a simple project to test and enhance my knowledge in the field. This project involves creating a program that detects faces in real-time using a machine learning algorithm. The algorithm collects images from the user and trains them using a Convolutional Neural Network (CNN), enabling real-time classification through the webcam.
For building this model, I used PyTorch, which allowed me to manually manage the model and to learn in practise how to build from scratch.
Visual representation of the CNN architecture.
- Input: 120x120x3 (RGB Image)
- Conv Layer 1:
- Convolution: 16 filters, 3x3 kernel, stride=1, padding=1
- Batch Normalization + ReLU
- Max Pooling: 2x2, stride=2
- Output: 60x60x16
- Conv Layer 2:
- Convolution: 32 filters, 3x3 kernel, stride=1, padding=1
- Batch Normalization + ReLU
- Max Pooling: 2x2, stride=2
- Output: 30x30x32
- Conv Layer 3:
- Convolution: 64 filters, 3x3 kernel, stride=1, padding=1
- Batch Normalization + ReLU
- Max Pooling: 2x2, stride=2
- Output: 15x15x64
- Conv Layer 4:
- Convolution: 128 filters, 3x3 kernel, stride=1, padding=1
- Batch Normalization + ReLU
- Max Pooling: 2x2, stride=2
- Output: 7x7x128
- Flatten: The 7x7x128 output is flattened into a 1D vector of size 6272.
- Fully Connected Layer 1:
- Input: 6272
- Output: 512 neurons
- Fully Connected Layer 2 (Output Layer):
- Input: 512 neurons
- Output: 5 neurons, 1 channel for the classifier (0, 1... It depends on how many subjects) and 4 channels for the Boundaries Boxes.
- The network progressively reduces the spatial size through max pooling while increasing the depth with more filters.
- After the convolutional layers, the features are flattened and passed through fully connected layers for classification.
- Finally, after training, the model is loaded and images taken in real time, by the webcam, are passed through the model resulting into a classifier and a Bbox.
Result of the model working properly identifying me and drawing the Boundary Box.
For running the Notebook correctly you will need to install the following Dependencies:
https://pytorch.org/get-started/locally/
pip install opencv-python
This part will be important because LabelMe does all the process of classifying your images and generating labels for them much more easily:
pip install jupyterlab
Or You can use any IDE of your own choice.
This project is licensed under the Apache License 2.0.