Model Card for Model ID

This modelcard aims to classify emotions into one of seven categories: anger, happy, sad, fear, surprise, disgust, neutral.

Model Details

Dataset:

Train: Happy - 14,379 / Angry - 7988 / Disgust - 872 / Sad - 9768 / Neutral - 9947 / Fear - 8200 / Surprise - 6376
Test: Happy - 3599 / Angry - 1918 / Disgust - 222 / Sad - 2386 / Neutral - 2449 / Fear - 2042 / Surprise - 1628
Val: Happy - 2880 / Angry - 1600 / Disgust - 172 / Sad - 1954 / Neutral - 1990 / Fear - 1640 / Surprise - 1628

Model:

Transfer learning using MobileNetv2 with 2 additional Dense layers and an output layer with softmax activation function.
Used weights to adjust for class imbalances.
Total Params: 3,675,823
Trainable Params: 136,839
Accuracy: 0.823 | Precision: 0.825 | Recall: 0.823 | F1: 0.821

Room for Improvement:

This model was created with extremely limited hardware acceleration (GPU) resources. Therefore, it is high likely that evaluation metrics that surpass the 95% mark can be achieved in the following manner:

MobileNetv2 was used for its fast inference and low latency but perhaps, with more resources, a more suitable base model can be found.
Data augmentation in order to better correct for class imbalances.
Using learning rate decay to train for longer (with lower LR) after nearing local minima (aprox 60 epochs).
Error Analysis

Uses

Cannot be used for commercial purposes in the EU.

Direct Use

Combine with the Open CV haar casacade for face detection.

How to Get Started with the Model

Use the script below to get started with the model locally on your device's camera:

import cv2
import numpy as np
import tensorflow as tf

def display_emotion(frame, model):
    font = cv2.FONT_HERSHEY_SIMPLEX
    font_scale = 1.5
    text_color = (0, 0, 255)
    x, y, w, h = 0, 0, 175, 75

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
    faces = face_cascade.detectMultiScale(gray, 1.1, 4)

    for x, y, w, h in faces:
        roi_gray = gray[y:y+h, x:x+w]
        roi_color = frame[y:y+h, x:x+w]
        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)  # Green square
        faces = face_cascade.detectMultiScale(roi_gray)

        if len(faces) == 0:
            print("Face not detected...")
        else:
            for (ex, ey, ew, eh) in faces:
                face_roi = roi_color[ey:ey+eh, ex:ex+ew]

        resized_image = cv2.resize(face_roi, (224, 224))
        final_image = np.expand_dims(resized_image, axis=0)

        predictions = model.predict(final_image)
        class_labels = ['angry', 'disgust', 'fear', 'happy', 'neutral', 'sad', 'surprise']
        predicted_label = class_labels[np.argmax(predictions)]

        # Black background rectangle
        cv2.rectangle(frame, (x, y), (x+w, y-25), (0, 0, 0), -1)
        # Add text
        cv2.putText(frame, predicted_label, (x, y-10), font, 0.7, text_color, 2)
        cv2.rectangle(frame, (x, y), (x+w, y+h), text_color)

    return frame

def main():
    model = tf.keras.models.load_model('emotion_detection.keras')
    cap = cv2.VideoCapture(1)

    if not cap.isOpened():
        cap = cv2.VideoCapture(0)
    if not cap.isOpened():
        raise IOError("Cannot open webcam")

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        frame = display_emotion(frame, model)
        cv2.imshow('Facial Expression Recognition', frame)

        if cv2.waitKey(2) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

Preprocessing [optional]

MobileNetv2 recieves image inputs of size (224, 224)

Speeds, Sizes, Times [optional]

Latency (local demo, no GPU): 39 ms/step

Model Card Authors [optional]

Ronny Nehme