MusaedMusaedSadeqMusaedAl-Fareh225739 commited on
Commit
b525620
·
1 Parent(s): f995563

revert to the previous readme file

Browse files
Files changed (1) hide show
  1. README.md +11 -391
README.md CHANGED
@@ -1,395 +1,15 @@
1
- # MrrrMe - Privacy-First Smart Mirror for Multi-Modal Emotion Detection
2
-
3
- **18-Week Specialization Project | Breda University of Applied Sciences**
4
-
5
- A privacy-first smart mirror system that performs real-time multi-modal emotion recognition combining facial expressions, voice tonality, and text sentiment analysis with conversational AI capabilities.
6
-
7
- ---
8
-
9
- ## Project Overview
10
-
11
- **Program**: AI & Data Science - Applied Data Science
12
- **Institution**: Breda University of Applied Sciences, Netherlands
13
- **Duration**: 18 weeks (February - June 2026)
14
- **Current Status**: Week 7 of 18 (11 weeks remaining)
15
-
16
- ### Problem Statement
17
-
18
- Traditional emotion recognition systems suffer from single-modality limitations, high latency, privacy concerns, and inability to detect masked emotions. MrrrMe addresses these challenges with a comprehensive multi-modal approach.
19
-
20
- ### Solution
21
-
22
- A privacy-first, multi-modal emotion detection system that:
23
- - Fuses facial expressions (40%), voice tonality (30%), and linguistic content (30%)
24
- - Processes everything locally with no cloud dependencies
25
- - Achieves sub-2-second response times
26
- - Generates empathetic, context-aware conversational responses
27
- - Integrates with customizable 3D avatars for natural interaction
28
-
29
- ---
30
-
31
- ## Key Features
32
-
33
- ### Multi-Modal Emotion Fusion
34
- - Weighted fusion algorithm combining three modalities
35
- - 4-class emotion model: Neutral, Happy, Sad, Angry
36
- - Confidence-based conflict resolution
37
- - Event-driven processing for 600x efficiency improvement
38
- - Quality-aware dynamic weight adjustment
39
-
40
- ### Facial Expression Analysis
41
- - Face detection using OpenCV Haar Cascade
42
- - Emotion recognition using ViT-Face-Expression (FER2013 dataset)
43
- - 70-75% baseline accuracy on facial expressions alone
44
- - Real-time processing with quality scoring
45
- - Efficient frame sampling (5% of frames processed)
46
-
47
- ### Voice Emotion Recognition
48
- - HuBERT-Large model for emotional prosody detection
49
- - Voice Activity Detection with 72.4% processing efficiency
50
- - Sub-50ms inference per audio chunk
51
- - 76.8% accuracy on voice-only emotion detection
52
- - Smart silence detection to reduce unnecessary processing
53
-
54
- ### Natural Language Understanding
55
- - Whisper (distil-large-v3) for accurate speech-to-text transcription
56
- - DistilRoBERTa for contextual sentiment analysis
57
- - Rule-based overrides for common phrases
58
- - Conversation memory across sessions
59
- - Multi-turn dialogue support
60
-
61
- ### Conversational AI Integration
62
- - Groq Cloud API with Llama 3.1 8B Instant model
63
- - Dual personality modes: Empathetic Therapist and Action-Focused Coach
64
- - Emotion-aware response generation
65
- - 1-2 second LLM response times
66
- - Configurable response styles: brief, balanced, detailed
67
-
68
- ### Avatar System
69
- - Customizable 3D avatars using Avaturn SDK
70
- - Realistic lip-sync with Coqui XTTS v2 TTS engine
71
- - 16 supported languages including English and Dutch
72
- - Emotion-driven facial expressions
73
- - Male and female voice options (Damien Black, Ana Florence)
74
-
75
- ### Web-Based Interface
76
- - Modern React/Next.js 16 frontend with TypeScript
77
- - Real-time WebSocket communication
78
- - Apple-inspired design system with light/dark mode
79
- - Responsive layout for desktop and mobile
80
- - Session-based authentication with SQLite backend
81
-
82
- ---
83
-
84
- ## Technology Stack
85
-
86
- ### Computer Vision & Face Analysis
87
-
88
- | Component | Technology | Size | Inference Time | Purpose |
89
- |-----------|-----------|------|----------------|---------|
90
- | Face Detection | OpenCV Haar Cascade | <1 MB | <10ms | Detect and localize faces |
91
- | Emotion Recognition | ViT-Face-Expression | ~90 MB | ~100ms | 7-class emotion classification |
92
- | Emotion Mapping | FER2013 to 4-class | N/A | <1ms | Simplify to actionable emotions |
93
-
94
- **Facial Emotion Classes**: Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral
95
- **Mapped to**: Neutral, Happy, Sad, Angry
96
-
97
- ### Audio Processing & Voice Analysis
98
-
99
- | Component | Technology | Size | Inference Time | Purpose |
100
- |-----------|-----------|------|----------------|---------|
101
- | Speech Transcription | Whisper (distil-large-v3) | ~140 MB | 0.37-1.04s | Audio to text conversion |
102
- | Voice Emotion | HuBERT-Large | ~300 MB | ~50ms | Emotional prosody detection |
103
- | Voice Activity Detection | Silero VAD | ~1 MB | <5ms | Speech segmentation |
104
- | Audio I/O | SoundDevice | N/A | N/A | Real-time audio capture |
105
-
106
- ### Natural Language Processing
107
-
108
- | Component | Technology | Size | Inference Time | Purpose |
109
- |-----------|-----------|------|----------------|---------|
110
- | Sentiment Analysis | DistilRoBERTa | ~260 MB | ~100ms | Text emotion extraction |
111
- | Conversational AI | Groq Cloud API (Llama 3.1 8B) | Cloud | 1-2s | Response generation |
112
- | Text-to-Speech | Coqui XTTS v2 | ~2 GB | 2-4s | Avatar voice synthesis |
113
-
114
- ### Frontend & Infrastructure
115
-
116
- | Component | Technology | Purpose |
117
- |-----------|-----------|---------|
118
- | Frontend Framework | Next.js 16 (React 19) | Modern web interface |
119
- | 3D Rendering | React Three Fiber + Three.js | Avatar visualization |
120
- | Avatar SDK | Avaturn SDK | Custom avatar creation |
121
- | Styling | Tailwind CSS v4 | Apple-inspired design system |
122
- | API Framework | FastAPI | WebSocket + REST endpoints |
123
- | Database | SQLite | User auth and session management |
124
- | Deployment | Docker + Nginx | Production containerization |
125
-
126
- ---
127
-
128
- ## System Architecture
129
-
130
- ```
131
- ┌─────────────────────────────────────────────────────────────────┐
132
- │ CLIENT (Web Browser) │
133
- │ ┌────────────────────────────────────────────────────────────┐ │
134
- │ │ Next.js 16 Frontend (React 19 + TypeScript) │ │
135
- │ │ - Avatar visualization (Three.js) │ │
136
- │ │ - Real-time emotion display │ │
137
- │ │ - Conversation history UI │ │
138
- │ └─────────────────┬──────────────────────────────────────────┘ │
139
- │ │ WebSocket │
140
- └────────────────────┼─────────────────────────────────────────────┘
141
-
142
- ┌────────────────────┼─────────────────────────────────────────────┐
143
- │ ┌─────▼──────┐ │
144
- │ │Nginx Proxy │ (Port 7860) │
145
- │ └──────┬─────┘ │
146
- │ │ │
147
- │ ┌───────────┼───────────────┐ │
148
- │ ▼ ▼ ▼ │
149
- │ ┌───────────┐ ┌─────────┐ ┌──────────────┐ │
150
- │ │ Next.js │ │FastAPI │ │ Avatar TTS │ │
151
- │ │ :3001 │ │ :8000 │ │ :8765 │ │
152
- │ └───────────┘ └────┬────┘ └──────────────┘ │
153
- │ │ │
154
- │ ┌──────┴──────┐ │
155
- │ ┌──────▼─────┐ ┌────▼──────┐ │
156
- │ │ Emotion │ │ Session │ │
157
- │ │ Pipeline │ │ Manager │ │
158
- │ └──────┬─────┘ └───────────┘ │
159
- │ │ │
160
- │ ┌─────────┼─────────┐ │
161
- │ ▼ ▼ ▼ │
162
- │ ┌──────┐ ┌──────┐ ┌──────┐ │
163
- │ │ Face │ │Voice │ │ Text │ │
164
- │ │ ViT │ │HuBERT│ │RoBERTa│ │
165
- │ └──────┘ └──────┘ └──────┘ │
166
- │ │ │
167
- │ ▼ │
168
- │ ┌────────────────┐ │
169
- │ │ Fusion Engine │ │
170
- │ └────────┬───────┘ │
171
- │ ▼ │
172
- │ ┌────────────────┐ │
173
- │ │ Groq Cloud │ │
174
- │ │ (Llama 3.1 8B) │ │
175
- │ └────────────────┘ │
176
- └─────────────────────────────────────────────────────────────────┘
177
- ```
178
-
179
- ---
180
-
181
- ## Performance Metrics
182
-
183
- ### Processing Latency
184
-
185
- | Component | Latency | Notes |
186
- |-----------|---------|-------|
187
- | Face Detection | 8-15ms | OpenCV Haar Cascade |
188
- | Facial Emotion | 80-120ms | ViT-Face-Expression |
189
- | Voice Emotion | 40-60ms | HuBERT per 3s chunk |
190
- | Whisper Transcription | 370ms - 1.04s | Length-dependent |
191
- | Text Sentiment | 90-110ms | DistilRoBERTa |
192
- | Fusion Calculation | <5ms | Weighted average |
193
- | LLM Generation | 1-2s | Groq Cloud API |
194
- | XTTS Synthesis | 2-4s | Coqui XTTS v2 |
195
- | **Total Response Time** | **1.5-2.5s** | Target achieved |
196
-
197
- ### Accuracy Metrics
198
-
199
- | Modality | Accuracy | Dataset/Notes |
200
- |----------|----------|---------------|
201
- | Face Only | 70-75% | ViT on FER2013 |
202
- | Voice Only | 76.8% | HuBERT on IEMOCAP |
203
- | Text Only | 81.2% | DistilRoBERTa + rules |
204
- | **Multi-Modal Fusion** | **85-88%** | Estimated combined accuracy |
205
-
206
- ---
207
-
208
- ## Installation
209
-
210
- ### Prerequisites
211
-
212
- - Python 3.11+
213
- - Node.js 20+
214
- - NVIDIA GPU with 4GB+ VRAM (recommended)
215
- - CUDA 11.8+ (for GPU acceleration)
216
- - Git LFS
217
-
218
- ### Local Development
219
-
220
- ```bash
221
- # Clone repository
222
- git clone https://github.com/YourUsername/MrrrMe.git
223
- cd MrrrMe
224
- git lfs install
225
- git lfs pull
226
-
227
- # Backend setup
228
- python -m venv venv
229
- source venv/bin/activate # or venv\Scripts\activate on Windows
230
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
231
- pip install -r requirements_docker.txt
232
-
233
- # Create .env file
234
- echo "GROQ_API_KEY=your_api_key_here" > .env
235
-
236
- # Frontend setup
237
- cd avatar-frontend
238
- npm install
239
- npm run build
240
- cd ..
241
-
242
- # Start services (3 terminals needed)
243
- # Terminal 1:
244
- cd avatar && python speak_server.py
245
-
246
- # Terminal 2:
247
- python mrrrme/backend_new.py
248
-
249
- # Terminal 3:
250
- cd avatar-frontend && npm run dev
251
- ```
252
-
253
- Access at `http://localhost:3000`
254
-
255
- ### Docker Deployment
256
-
257
- ```bash
258
- # Build image
259
- docker build -t mrrrme:latest .
260
-
261
- # Run with GPU
262
- docker run --gpus all -p 7860:7860 mrrrme:latest
263
-
264
- # Run CPU only
265
- docker run -p 7860:7860 mrrrme:latest
266
- ```
267
-
268
- ---
269
-
270
- ## Project Structure
271
-
272
- ```
273
- MrrrMe/
274
- ├── avatar-frontend/ # Next.js web application
275
- │ ├── app/ # Next.js app router
276
- │ ├── public/ # Static assets
277
- │ └── package.json
278
- ├── mrrrme/ # Python backend
279
- │ ├── backend/ # Modular FastAPI backend
280
- │ │ ├── auth/ # Authentication
281
- │ │ ├── models/ # AI model loading
282
- │ │ ├── processing/ # Core processing
283
- │ │ └── session/ # Session management
284
- │ ├── audio/ # Audio processing
285
- │ ├── nlp/ # NLP modules
286
- │ ├── vision/ # Computer vision
287
- │ └── config.py # Global configuration
288
- ├── avatar/ # Avatar TTS backend
289
- ├── model/ # Neural network architectures
290
- ├── weights/ # Model weights (LFS)
291
- ├── Dockerfile # Container definition
292
- └── requirements_docker.txt # Python dependencies
293
- ```
294
-
295
- See individual folder READMEs for detailed documentation of each component.
296
-
297
- ---
298
-
299
- ## Configuration
300
-
301
- ### Emotion Fusion Weights
302
-
303
- ```python
304
- # mrrrme/config.py or mrrrme/backend/config.py
305
- FUSION_WEIGHTS = {
306
- 'face': 0.40, # Facial expressions
307
- 'voice': 0.30, # Vocal prosody
308
- 'text': 0.30 # Linguistic sentiment
309
- }
310
- ```
311
-
312
- ### LLM Settings
313
-
314
- ```python
315
- LLM_RESPONSE_STYLE = "balanced" # Options: brief, balanced, detailed
316
- PERSONALITY = "therapist" # Options: therapist, coach
317
- ```
318
-
319
- ### Supported Languages
320
-
321
- Primary: English (en), Dutch (nl)
322
- TTS Supported (16 total): en, nl, fr, de, it, es, ja, zh, pt, pl, tr, ru, cs, ar, hu, ko
323
-
324
  ---
325
-
326
- ## Development Timeline
327
-
328
- ### Weeks 1-7 (Completed)
329
- - Multi-modal emotion detection pipeline
330
- - Web frontend with 3D avatar system
331
- - Real-time WebSocket communication
332
- - User authentication and session management
333
- - Groq API and XTTS v2 integration
334
-
335
- ### Weeks 8-18 (Planned)
336
- - **8-9**: Testing, optimization, bug fixes
337
- - **10-12**: Avatar enhancement and animation refinement
338
- - **13-15**: UI/UX improvements and feature expansion
339
- - **16**: Extended memory and context management
340
- - **17**: User testing and feedback integration
341
- - **18**: Demo preparation and final documentation
342
-
343
  ---
344
 
345
- ## API Reference
346
-
347
- ### WebSocket Events
348
-
349
- **Client to Server**:
350
- - `auth`: Session authentication
351
- - `video_frame`: Base64 encoded video frame
352
- - `audio_chunk`: Base64 encoded audio data
353
- - `speech_end`: Transcribed speech text
354
- - `preferences`: Voice, language, personality settings
355
-
356
- **Server to Client**:
357
- - `face_emotion`: Detected facial emotion with probabilities
358
- - `voice_emotion`: Detected voice emotion
359
- - `llm_response`: AI-generated response with audio and visemes
360
-
361
- ---
362
-
363
- ## Team
364
-
365
- **Musaed Al-Fareh** - AI & Data Science Student
366
- Email: 225739@buas.nl
367
- LinkedIn: [linkedin.com/in/musaed-alfareh-a365521b9](https://www.linkedin.com/in/musaed-alfareh-a365521b9/)
368
-
369
- **Michon Goddijn** - AI & Data Science Student
370
- Email: 231849@buas.nl
371
-
372
- **Lorena Kraljić** - Tourism Student
373
- Email: 226142@buas.nl
374
-
375
- ---
376
-
377
- ## License
378
-
379
- MIT License
380
-
381
- Component licenses: ViT-Face-Expression (MIT), Whisper (MIT), HuBERT (MIT), Llama 3.1 (Llama 2 Community License), Coqui XTTS v2 (MPL 2.0)
382
-
383
- ---
384
-
385
- ## Contact
386
-
387
- **Repository**: [GitHub - MrrrMe](https://github.com/YourUsername/MrrrMe)
388
- **Live Demo**: [Hugging Face Spaces](https://huggingface.co/spaces/michon/mrrrme-emotion-ai)
389
- **Email**: 225739@buas.nl
390
-
391
- ---
392
 
393
- **Last Updated**: December 9, 2024
394
- **Version**: 2.0.0
395
- **Status**: Active Development (Week 7/18)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Mrrrme Emotion Ai
3
+ emoji: 🌍
4
+ colorFrom: indigo
5
+ colorTo: purple
6
+ sdk: docker
7
+ pinned: false
8
+ license: mit
9
+ short_description: MrrrMe
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
+ "# Test by [friend name]"
14
+ "# Test by [Michon]"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15