# Video and Image Emotion Annotation

This script facilitates the detection of faces and annotation of recognized emotions in both videos and images. It utilizes state-of-the-art deep learning models for face detection and emotion recognition, namely RetinaFace and HSEmotionRecognizer, respectively. The goal is to enhance media content understanding by automatically labeling facial expressions with emotional states.
Components:

## Face Detection using RetinaFace:
     The detect_faces function leverages the RetinaFace model to identify faces within a given frame of video or image data. It retrieves facial bounding boxes, providing precise coordinates for subsequent processing.

## Emotion Recognition with HSEmotionRecognizer:
     The HSEmotionRecognizer model, initialized as recognizer, interprets emotional states from extracted face regions. It predicts emotions based on learned features from the provided face images.

## Annotation and Visualization:
     The annotate_frame function annotates each detected face with its recognized emotion. It draws bounding boxes around faces and labels them with the predicted emotional state, enhancing visual understanding of the content.

## Processing Pipeline:
        Video Processing:
        process_video_frames: Iterates through frames of a video, applying face detection and emotion annotation. It saves the processed frames into a temporary video file.
        add_audio_to_video: Incorporates audio from the original video back into the processed frames, creating a final annotated video output.
        process_video: Integrates frame processing and audio addition into a cohesive function for video processing tasks.
        Image Processing:
        process_image: Handles single images by detecting faces, annotating emotions, and optionally combining input and annotated images for visualization.

# Usage:

    Video Processing: Provide paths to video files (*.mp4, *.avi, *.mov, *.mkv) to analyze and annotate facial expressions throughout the video duration.
    Image Processing: For static images (*.jpg, *.jpeg, *.png), the script detects faces, predicts emotions, and optionally displays the original and annotated images side by side.


## Setup
install the required libraries:

In [1]:
! pip install retina-face hsemotion moviepy

Collecting retina-face
  Downloading retina_face-0.0.17-py3-none-any.whl (25 kB)
Collecting hsemotion
  Downloading hsemotion-0.3.0.tar.gz (8.0 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting timm (from hsemotion)
  Downloading timm-1.0.3-py3-none-any.whl (2.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch->hsemotion)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch->hsemotion)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch->hsemotion)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch->hsemotion)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_

In [2]:
from moviepy.editor import VideoFileClip, concatenate_videoclips
from retinaface import RetinaFace
from hsemotion.facial_emotions import HSEmotionRecognizer
import cv2
import numpy as np
import os
from google.colab.patches import cv2_imshow  # Import cv2_imshow for Colab

In [15]:
## Initialize recognizer

recognizer = HSEmotionRecognizer(model_name='enet_b0_8_best_vgaf', device='cpu')

## Face Detection Function

def detect_faces(frame):
    """ Detect faces in the frame using RetinaFace """
    faces = RetinaFace.detect_faces(frame)
    if isinstance(faces, dict):
        face_list = []
        for key in faces.keys():
            face = faces[key]
            facial_area = face['facial_area']
            face_dict = {
                'box': (facial_area[0], facial_area[1], facial_area[2] - facial_area[0], facial_area[3] - facial_area[1])
            }
            face_list.append(face_dict)
        return face_list
    return []

## Annotation Function

def annotate_frame(frame, faces):
    """ Annotate the frame with recognized emotions using global recognizer """
    for face in faces:
        x, y, w, h = face['box']
        face_image = frame[y:y+h, x:x+w]  # Extract face region from frame
        emotion = classify_emotions(face_image)
        cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
        cv2.putText(frame, emotion, (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)

## Emotion Classification Function

def classify_emotions(face_image):
    """ Classify emotions for the given face image using global recognizer """
    results = recognizer.predict_emotions(face_image)
    if results:
        emotion = results[0]  # Get the most likely emotion
    else:
        emotion = 'Unknown'
    return emotion

## Process Video Frames

def process_video_frames(video_path, temp_output_path, frame_skip=5):
    # Load the video
    video_clip = VideoFileClip(video_path)
    fps = video_clip.fps

    # Initialize output video writer
    out = cv2.VideoWriter(temp_output_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (int(video_clip.size[0]), int(video_clip.size[1])))

    # Iterate through frames, detect faces, and annotate emotions
    frame_count = 0
    for frame in video_clip.iter_frames():
        if frame_count % frame_skip == 0:  # Process every nth frame
            faces = detect_faces(frame)
            annotate_frame(frame, faces)
        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)  # Convert RGB to BGR for OpenCV
        out.write(frame)
        frame_count += 1

    # Release resources and cleanup
    out.release()
    cv2.destroyAllWindows()
    video_clip.close()

## Add Audio to Processed Video

def add_audio_to_video(original_video_path, processed_video_path, output_path):
    try:
        original_clip = VideoFileClip(original_video_path)
        processed_clip = VideoFileClip(processed_video_path)
        final_clip = processed_clip.set_audio(original_clip.audio)
        final_clip.write_videofile(output_path, codec='libx264', audio_codec='aac')
    except Exception as e:
        print(f"Error while combining with audio: {e}")
    finally:
        original_clip.close()
        processed_clip.close()

## Process Video

def process_video(video_path, output_path):
    temp_output_path = 'temp_output_video.mp4'

    # Process video frames and save to a temporary file
    process_video_frames(video_path, temp_output_path, frame_skip=5)  # Adjust frame_skip as needed

    # Add audio to the processed video
    add_audio_to_video(video_path, temp_output_path, output_path)

## Process Image

def process_image(input_path, output_path):
    # Step 1: Read input image
    image = cv2.imread(input_path)
    if image is None:
        print(f"Error: Unable to read image at '{input_path}'")
        return

    # Step 2: Detect faces and annotate emotions
    faces = detect_faces(image)
    annotate_frame(image, faces)

    # Step 3: Write annotated image to output path
    cv2.imwrite(output_path, image)

    # Step 4: Combine input and output images horizontally
    input_image = cv2.imread(input_path)
    combined_image = cv2.hconcat([input_image, image])

    # Step 5: Save or display the combined image
    cv2.imwrite(output_path, combined_image)
    cv2_imshow(combined_image)  # Display combined image in Colab



/root/.hsemotion/enet_b0_8_best_vgaf.pt Compose(
    Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=True)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)


# Time to process the video or image
**NOTE : You can use your own data by changing the path**

In [16]:
if __name__ == "__main__":
    input_path = '/content/رياكشن عبلة كامل تبكي.mp4'  # Update with your video or image path
    output_path = '/content/رياكشن عبلة كامل تبكي out.mp4'  # Update with the desired output path

    if input_path.lower().endswith(('.mp4', '.avi', '.mov', '.mkv')):
        process_video(input_path, output_path)
    elif input_path.lower().endswith(('.jpg', '.jpeg', '.png')):
        process_image(input_path, output_path)
    else:
        print("Unsupported file format. Please provide a video or image file.")

Moviepy - Building video /content/رياكشن عبلة كامل تبكي out.mp4.
MoviePy - Writing audio in رياكشن عبلة كامل تبكي outTEMP_MPY_wvf_snd.mp4




MoviePy - Done.
Moviepy - Writing video /content/رياكشن عبلة كامل تبكي out.mp4





Moviepy - Done !
Moviepy - video ready /content/رياكشن عبلة كامل تبكي out.mp4


In [None]:
if __name__ == "__main__":
    input_path = '/content/mn (2).jpeg'  # Update with your video or image path
    output_path = '/content/mn (2)-out.jpeg'  # Update with the desired output path

    if input_path.lower().endswith(('.mp4', '.avi', '.mov', '.mkv')):
        process_video(input_path, output_path)
    elif input_path.lower().endswith(('.jpg', '.jpeg', '.png')):
        process_image(input_path, output_path)
    else:
        print("Unsupported file format. Please provide a video or image file.")