Pradheep1647 commited on
Commit
cb017a7
Β·
1 Parent(s): 3e6c751

updated readme file

Browse files
Files changed (1) hide show
  1. README.md +37 -13
README.md CHANGED
@@ -1,13 +1,37 @@
1
- ---
2
- title: Multi Modal Emotion Recognition
3
- emoji: πŸ“ˆ
4
- colorFrom: gray
5
- colorTo: blue
6
- sdk: gradio
7
- sdk_version: 4.44.0
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Multi Modal Emotion Recognition πŸ“ˆ
2
+
3
+ This application allows users to analyze emotions from videos using state-of-the-art models for both audio and visual content. You can upload videos (maximum length of 2 minutes) to extract emotions from both speech and facial expressions in real-time.
4
+
5
+ ## Features:
6
+ - **Audio Emotion Detection:** Uses OpenAI's Whisper model for transcription and Cardiff NLP's RoBERTa model for emotion recognition in text.
7
+ - **Visual Emotion Analysis:** Leverages Salesforce's BLIP model for image captioning and J-Hartmann's DistilRoBERTa for visual emotion recognition.
8
+
9
+ ## Instructions:
10
+ 1. Upload a video file (maximum length: **2 minutes**).
11
+ 2. The app will analyze both the audio and visual components of the video to extract and display emotions in real-time.
12
+
13
+ ## Models Used:
14
+ The models have been handpicked after numerous trials and are optimized for this task. Below are the models and the corresponding research papers:
15
+
16
+ 1. **Cardiff NLP RoBERTa for Emotion Recognition from Text:**
17
+ - [Model: cardiffnlp/twitter-roberta-base-emotion](https://huggingface.co/cardiffnlp/twitter-roberta-base-emotion)
18
+ - [Paper: RoBERTa Sentiment & Emotion Analysis](https://arxiv.org/pdf/2010.12421)
19
+
20
+ 2. **Salesforce BLIP for Image Captioning and Visual Emotion Analysis:**
21
+ - [Model: Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base)
22
+ - [Paper: BLIP - Bootstrapping Language-Image Pre-training](https://arxiv.org/abs/2201.12086)
23
+
24
+ 3. **J-Hartmann DistilRoBERTa for Emotion Recognition from Images:**
25
+ - [Model: j-hartmann/emotion-english-distilroberta-base](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base)
26
+
27
+ 4. **OpenAI Whisper for Speech-to-Text Transcription:**
28
+ - [Model: openai/whisper-base](https://huggingface.co/openai/whisper-base)
29
+ - [Paper: Whisper - Speech Recognition](https://arxiv.org/abs/2212.04356)
30
+
31
+ These models were selected based on extensive trials to ensure the best performance for this multimodal emotion recognition task.
32
+
33
+ ## Access the App:
34
+ You can try the app [here](https://huggingface.co/spaces/Pradheep1647/multi-modal-emotion-recognition).
35
+
36
+ ## License:
37
+ This project is licensed under the MIT License.