Very fast and accurate classification model for facial expressions, using Mediapipe. Also includes models for eye closeness and eye gaze recognition.
This project solves the problem of robust, real-time classification of facial expressions in single-face images, with fast predictions for low-power devices.
Using the state-of-the-art Google Mediapipe facial landmark detection model, we can use face meshes (undirected graphs) to train the model, instead of using the raw images. This not only reduces the feature space, but also leaves the classification step immune to changes in lighting, noise, rotation/translation/scale, and other factors.
The dataset used is a combination of the FER-2013 dataset and other public materials. It can be found at https://kaggle.com/datasets/669f5ba44ea30e40a7a42fb066bfa0cb89ca843deee526d633b4803014a49912
-
Clone the repo
git clone
-
Install Python packages
pip install -r requirements.txt
-
Download the dataset from Kaggle and and extract its folders to the repository root folder, so it will look like this:
root ├── face_angry ├── face_disgusted ├── face_fear ├── face_happy ├── face_neutral ├── face_sad └── face_surprised
-
If desired, modify the global variables of augment.py and run the script to augment a folder of the dataset
python augment.py
Each folder of the dataset can be individually augmented using the augment.py script, in order to increase the balance and total number of samples. It uses the Albumentations library to randomly apply the following spatial-level transformations:
- GridDistortion
- OpticalDistortion
- HorizontalFlip
The distances from each point in each face mesh to a fixed point are obtained, normalized and saved to intermediate files. Then, Principal Component Analysis is used to reduce the dimensionality of the data, and the resulting data is used to train a the final model.
According to the results, the best model tested is a Support Vector Machine with a RBF kernel, using 50 components in the PCA step, StandardScaler for normalization, and C=5 for the SVM.
- cd into SVM folder
- Run landmarks.ipynb to generate the landmarks of the faces
- Run pca.ipynb to generate the pca of the landmarks
- Run model.ipynb to train the model and test it
The normalized X,Y coordinates of each point in each face mesh are obtained and saved to intermediate files, along with an adjacency matrix for each face mesh. Then, a Graph Neural Network built using Keras and Spektral is used to train the final model.
- cd into GNN folder
- Run landmarks_graph.ipynb to generate the meshes of the faces
- Run model_test_gnn.ipynb to train the model and test it
The "face_landmarker_v2_with_blendshapes" model is used to extract the blendshapes of the face meshes, including high-level features such as eye openness, mouth openness, and eyebrow position. These features are then used to train a Support Vector Machine model.
- cd into SVM_Blendshapes folder
- Run Blendshapes.ipynb to generate the blendshapes features of the faces
- Run Model.ipynb to train the model and test it
For an additional layer of interaction, three separate smaller models were also trained:
- Useful for gaze tracking and attention detection.
- Regression model that predicts the x and y coordinates of the gaze point.
- Trained on the entire dataset, with a 90/10 train/test split.
- Final model is a BayesianRidge model.
- Cannot be used for each eye separately.
- Useful for detecting blinks and drowsiness.
- Classification model that predicts between 4 labels (eye_open, eye_closed, eye_narrowed, eye_wide)
- Trained on a custom subset of the dataset.
- Final model is a SVC model with a Polynomial kernel of degree 3.
- Can be used for each eye separately.
- Useful for detecting speech.
- Classification model that predicts between 2 labels (mouth_open, mouth_closed)
- Trained on a custom subset of the dataset.
- Final model is a SVM model.
For eye gaze:
- Run ground_truth.ipynb to generate the ground truth of the eye gaze
- Run landmarks.ipynb to generate the landmarks of the faces
- Run model.ipynb to train the model and test it
For eye closeness:
- Populate the folders:
Eye Closeness
├── eye_closed
├── eye_narrowed
├── eye_open
├── eye_wide
- Run data.ipynb notebook to generate the generate a csv file with numerical features
- Run the classifier.ipynb notebook to train the model and test it
For mouth closeness:
- Populate the folders:
Mouth Closeness
├── mouth_closed
├── mouth_open
- Run data.ipynb notebook to generate the generate a csv file with numerical features
- Run the classifier.ipynb notebook to train the model and test it
NOTE: The logreg.ipynb notebook is a deprecated version of the pipeline. It is kept for reference purposes only.