Spaces:

Pradheep1647
/

multi-modal-emotion-recognition

Running

App Files Files Community

Pradheep1647 commited on Sep 23, 2024

Commit

cb017a7

1 Parent(s): 3e6c751

updated readme file

Browse files

Files changed (1) hide show

README.md +37 -13

README.md CHANGED Viewed

@@ -1,13 +1,37 @@
----
-title: Multi Modal Emotion Recognition
-emoji: 📈
-colorFrom: gray
-colorTo: blue
-sdk: gradio
-sdk_version: 4.44.0
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Multi Modal Emotion Recognition 📈
+This application allows users to analyze emotions from videos using state-of-the-art models for both audio and visual content. You can upload videos (maximum length of 2 minutes) to extract emotions from both speech and facial expressions in real-time.
+## Features:
+- **Audio Emotion Detection:** Uses OpenAI's Whisper model for transcription and Cardiff NLP's RoBERTa model for emotion recognition in text.
+- **Visual Emotion Analysis:** Leverages Salesforce's BLIP model for image captioning and J-Hartmann's DistilRoBERTa for visual emotion recognition.
+## Instructions:
+1. Upload a video file (maximum length: **2 minutes**).
+2. The app will analyze both the audio and visual components of the video to extract and display emotions in real-time.
+## Models Used:
+The models have been handpicked after numerous trials and are optimized for this task. Below are the models and the corresponding research papers:
+1. **Cardiff NLP RoBERTa for Emotion Recognition from Text:**
+   - [Model: cardiffnlp/twitter-roberta-base-emotion](https://huggingface.co/cardiffnlp/twitter-roberta-base-emotion)
+   - [Paper: RoBERTa Sentiment & Emotion Analysis](https://arxiv.org/pdf/2010.12421)
+2. **Salesforce BLIP for Image Captioning and Visual Emotion Analysis:**
+   - [Model: Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
+   - [Paper: BLIP - Bootstrapping Language-Image Pre-training](https://arxiv.org/abs/2201.12086)
+3. **J-Hartmann DistilRoBERTa for Emotion Recognition from Images:**
+   - [Model: j-hartmann/emotion-english-distilroberta-base](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base)
+4. **OpenAI Whisper for Speech-to-Text Transcription:**
+   - [Model: openai/whisper-base](https://huggingface.co/openai/whisper-base)
+   - [Paper: Whisper - Speech Recognition](https://arxiv.org/abs/2212.04356)
+These models were selected based on extensive trials to ensure the best performance for this multimodal emotion recognition task.
+## Access the App:
+You can try the app [here](https://huggingface.co/spaces/Pradheep1647/multi-modal-emotion-recognition).
+## License:
+This project is licensed under the MIT License.