Skip to content

Latest commit

 

History

History
140 lines (101 loc) · 5.25 KB

README.md

File metadata and controls

140 lines (101 loc) · 5.25 KB

Facial-Emotion-Recognition

Very fast and accurate classification model for facial expressions, using Mediapipe. Also includes models for eye closeness and eye gaze recognition.

Project Description

This project solves the problem of robust, real-time classification of facial expressions in single-face images, with fast predictions for low-power devices.

Using the state-of-the-art Google Mediapipe facial landmark detection model, we can use face meshes (undirected graphs) to train the model, instead of using the raw images. This not only reduces the feature space, but also leaves the classification step immune to changes in lighting, noise, rotation/translation/scale, and other factors.

The dataset used is a combination of the FER-2013 dataset and other public materials. It can be found at https://kaggle.com/datasets/669f5ba44ea30e40a7a42fb066bfa0cb89ca843deee526d633b4803014a49912

Installation

  1. Clone the repo

    git clone
  2. Install Python packages

    pip install -r requirements.txt
  3. Download the dataset from Kaggle and and extract its folders to the repository root folder, so it will look like this:

    root
    ├── face_angry
    ├── face_disgusted
    ├── face_fear
    ├── face_happy
    ├── face_neutral
    ├── face_sad
    └── face_surprised
  4. If desired, modify the global variables of augment.py and run the script to augment a folder of the dataset

    python augment.py

Augmentation

Each folder of the dataset can be individually augmented using the augment.py script, in order to increase the balance and total number of samples. It uses the Albumentations library to randomly apply the following spatial-level transformations:

  • GridDistortion
  • OpticalDistortion
  • HorizontalFlip

Approach 1 - PCA and SVM

The distances from each point in each face mesh to a fixed point are obtained, normalized and saved to intermediate files. Then, Principal Component Analysis is used to reduce the dimensionality of the data, and the resulting data is used to train a the final model.

According to the results, the best model tested is a Support Vector Machine with a RBF kernel, using 50 components in the PCA step, StandardScaler for normalization, and C=5 for the SVM.

Usage for approach 1

  1. cd into SVM folder
  2. Run landmarks.ipynb to generate the landmarks of the faces
  3. Run pca.ipynb to generate the pca of the landmarks
  4. Run model.ipynb to train the model and test it

Approach 2 - GNN

The normalized X,Y coordinates of each point in each face mesh are obtained and saved to intermediate files, along with an adjacency matrix for each face mesh. Then, a Graph Neural Network built using Keras and Spektral is used to train the final model.

Usage for approach 2

  1. cd into GNN folder
  2. Run landmarks_graph.ipynb to generate the meshes of the faces
  3. Run model_test_gnn.ipynb to train the model and test it

Approach 3 - SVM using Blendshapes

The "face_landmarker_v2_with_blendshapes" model is used to extract the blendshapes of the face meshes, including high-level features such as eye openness, mouth openness, and eyebrow position. These features are then used to train a Support Vector Machine model.

Usage for approach 3

  1. cd into SVM_Blendshapes folder
  2. Run Blendshapes.ipynb to generate the blendshapes features of the faces
  3. Run Model.ipynb to train the model and test it

Eye Gaze, Eye Closeness and Mouth Closeness

For an additional layer of interaction, three separate smaller models were also trained:

Gaze

  • Useful for gaze tracking and attention detection.
  • Regression model that predicts the x and y coordinates of the gaze point.
  • Trained on the entire dataset, with a 90/10 train/test split.
  • Final model is a BayesianRidge model.
  • Cannot be used for each eye separately.

Eye Closeness

  • Useful for detecting blinks and drowsiness.
  • Classification model that predicts between 4 labels (eye_open, eye_closed, eye_narrowed, eye_wide)
  • Trained on a custom subset of the dataset.
  • Final model is a SVC model with a Polynomial kernel of degree 3.
  • Can be used for each eye separately.

Mouth Closeness

  • Useful for detecting speech.
  • Classification model that predicts between 2 labels (mouth_open, mouth_closed)
  • Trained on a custom subset of the dataset.
  • Final model is a SVM model.

Usage

For eye gaze:

  1. Run ground_truth.ipynb to generate the ground truth of the eye gaze
  2. Run landmarks.ipynb to generate the landmarks of the faces
  3. Run model.ipynb to train the model and test it

For eye closeness:

  1. Populate the folders:
    Eye Closeness
    ├── eye_closed
    ├── eye_narrowed
    ├── eye_open
    ├── eye_wide
  1. Run data.ipynb notebook to generate the generate a csv file with numerical features
  2. Run the classifier.ipynb notebook to train the model and test it

For mouth closeness:

  1. Populate the folders:
    Mouth Closeness
    ├── mouth_closed
    ├── mouth_open
  1. Run data.ipynb notebook to generate the generate a csv file with numerical features
  2. Run the classifier.ipynb notebook to train the model and test it

NOTE: The logreg.ipynb notebook is a deprecated version of the pipeline. It is kept for reference purposes only.