Spaces:

michon
/

mrrrme-emotion-ai

Sleeping

App Files Files Community

MusaedMusaedSadeqMusaedAl-Fareh225739 commited on 4 days ago

Commit

b525620

1 Parent(s): f995563

revert to the previous readme file

Browse files

Files changed (1) hide show

README.md +11 -391

README.md CHANGED Viewed

@@ -1,395 +1,15 @@
-# MrrrMe - Privacy-First Smart Mirror for Multi-Modal Emotion Detection
-**18-Week Specialization Project | Breda University of Applied Sciences**
-A privacy-first smart mirror system that performs real-time multi-modal emotion recognition combining facial expressions, voice tonality, and text sentiment analysis with conversational AI capabilities.
----
-## Project Overview
-**Program**: AI & Data Science - Applied Data Science
-**Institution**: Breda University of Applied Sciences, Netherlands
-**Duration**: 18 weeks (February - June 2026)
-**Current Status**: Week 7 of 18 (11 weeks remaining)
-### Problem Statement
-Traditional emotion recognition systems suffer from single-modality limitations, high latency, privacy concerns, and inability to detect masked emotions. MrrrMe addresses these challenges with a comprehensive multi-modal approach.
-### Solution
-A privacy-first, multi-modal emotion detection system that:
-- Fuses facial expressions (40%), voice tonality (30%), and linguistic content (30%)
-- Processes everything locally with no cloud dependencies
-- Achieves sub-2-second response times
-- Generates empathetic, context-aware conversational responses
-- Integrates with customizable 3D avatars for natural interaction
----
-## Key Features
-### Multi-Modal Emotion Fusion
-- Weighted fusion algorithm combining three modalities
-- 4-class emotion model: Neutral, Happy, Sad, Angry
-- Confidence-based conflict resolution
-- Event-driven processing for 600x efficiency improvement
-- Quality-aware dynamic weight adjustment
-### Facial Expression Analysis
-- Face detection using OpenCV Haar Cascade
-- Emotion recognition using ViT-Face-Expression (FER2013 dataset)
-- 70-75% baseline accuracy on facial expressions alone
-- Real-time processing with quality scoring
-- Efficient frame sampling (5% of frames processed)
-### Voice Emotion Recognition
-- HuBERT-Large model for emotional prosody detection
-- Voice Activity Detection with 72.4% processing efficiency
-- Sub-50ms inference per audio chunk
-- 76.8% accuracy on voice-only emotion detection
-- Smart silence detection to reduce unnecessary processing
-### Natural Language Understanding
-- Whisper (distil-large-v3) for accurate speech-to-text transcription
-- DistilRoBERTa for contextual sentiment analysis
-- Rule-based overrides for common phrases
-- Conversation memory across sessions
-- Multi-turn dialogue support
-### Conversational AI Integration
-- Groq Cloud API with Llama 3.1 8B Instant model
-- Dual personality modes: Empathetic Therapist and Action-Focused Coach
-- Emotion-aware response generation
-- 1-2 second LLM response times
-- Configurable response styles: brief, balanced, detailed
-### Avatar System
-- Customizable 3D avatars using Avaturn SDK
-- Realistic lip-sync with Coqui XTTS v2 TTS engine
-- 16 supported languages including English and Dutch
-- Emotion-driven facial expressions
-- Male and female voice options (Damien Black, Ana Florence)
-### Web-Based Interface
-- Modern React/Next.js 16 frontend with TypeScript
-- Real-time WebSocket communication
-- Apple-inspired design system with light/dark mode
-- Responsive layout for desktop and mobile
-- Session-based authentication with SQLite backend
----
-## Technology Stack
-### Computer Vision & Face Analysis
-| Component | Technology | Size | Inference Time | Purpose |
-|-----------|-----------|------|----------------|---------|
-| Face Detection | OpenCV Haar Cascade | <1 MB | <10ms | Detect and localize faces |
-| Emotion Recognition | ViT-Face-Expression | ~90 MB | ~100ms | 7-class emotion classification |
-| Emotion Mapping | FER2013 to 4-class | N/A | <1ms | Simplify to actionable emotions |
-**Facial Emotion Classes**: Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral
-**Mapped to**: Neutral, Happy, Sad, Angry
-### Audio Processing & Voice Analysis
-| Component | Technology | Size | Inference Time | Purpose |
-|-----------|-----------|------|----------------|---------|
-| Speech Transcription | Whisper (distil-large-v3) | ~140 MB | 0.37-1.04s | Audio to text conversion |
-| Voice Emotion | HuBERT-Large | ~300 MB | ~50ms | Emotional prosody detection |
-| Voice Activity Detection | Silero VAD | ~1 MB | <5ms | Speech segmentation |
-| Audio I/O | SoundDevice | N/A | N/A | Real-time audio capture |
-### Natural Language Processing
-| Component | Technology | Size | Inference Time | Purpose |
-|-----------|-----------|------|----------------|---------|
-| Sentiment Analysis | DistilRoBERTa | ~260 MB | ~100ms | Text emotion extraction |
-| Conversational AI | Groq Cloud API (Llama 3.1 8B) | Cloud | 1-2s | Response generation |
-| Text-to-Speech | Coqui XTTS v2 | ~2 GB | 2-4s | Avatar voice synthesis |
-### Frontend & Infrastructure
-| Component | Technology | Purpose |
-|-----------|-----------|---------|
-| Frontend Framework | Next.js 16 (React 19) | Modern web interface |
-| 3D Rendering | React Three Fiber + Three.js | Avatar visualization |
-| Avatar SDK | Avaturn SDK | Custom avatar creation |
-| Styling | Tailwind CSS v4 | Apple-inspired design system |
-| API Framework | FastAPI | WebSocket + REST endpoints |
-| Database | SQLite | User auth and session management |
-| Deployment | Docker + Nginx | Production containerization |
----
-## System Architecture
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                    CLIENT (Web Browser)                          │
-│  ┌────────────────────────────────────────────────────────────┐ │
-│  │  Next.js 16 Frontend (React 19 + TypeScript)               │ │
-│  │  - Avatar visualization (Three.js)                         │ │
-│  │  - Real-time emotion display                               │ │
-│  │  - Conversation history UI                                 │ │
-│  └─────────────────┬──────────────────────────────────────────┘ │
-│                    │ WebSocket                                   │
-└────────────────────┼─────────────────────────────────────────────┘
-                     │
-┌────────────────────┼─────────────────────────────────────────────┐
-│              ┌─────▼──────┐                                      │
-│              │Nginx Proxy │ (Port 7860)                          │
-│              └──────┬─────┘                                      │
-│                     │                                             │
-│         ┌───────────┼───────────────┐                            │
-│         ▼           ▼               ▼                            │
-│  ┌───────────┐ ┌─────────┐  ┌──────────────┐                   │
-│  │ Next.js   │ │FastAPI  │  │ Avatar TTS   │                   │
-│  │ :3001     │ │ :8000   │  │ :8765        │                   │
-│  └───────────┘ └────┬────┘  └──────────────┘                   │
-│                     │                                             │
-│              ┌──────┴──────┐                                     │
-│       ┌──────▼─────┐ ┌────▼──────┐                             │
-│       │ Emotion    │ │ Session   │                             │
-│       │ Pipeline   │ │ Manager   │                             │
-│       └──────┬─────┘ └───────────┘                             │
-│              │                                                   │
-│    ┌─────────┼─────────┐                                       │
-│    ▼         ▼         ▼                                       │
-│ ┌──────┐ ┌──────┐ ┌──────┐                                   │
-│ │ Face │ │Voice │ │ Text │                                   │
-│ │ ViT  │ │HuBERT│ │RoBERTa│                                  │
-│ └──────┘ └──────┘ └──────┘                                   │
-│              │                                                   │
-│              ▼                                                   │
-│     ┌────────────────┐                                         │
-│     │ Fusion Engine  │                                         │
-│     └────────┬───────┘                                         │
-│              ▼                                                   │
-│     ┌────────────────┐                                         │
-│     │  Groq Cloud    │                                         │
-│     │ (Llama 3.1 8B) │                                         │
-│     └────────────────┘                                         │
-└─────────────────────────────────────────────────────────────────┘
-```
----
-## Performance Metrics
-### Processing Latency
-| Component | Latency | Notes |
-|-----------|---------|-------|
-| Face Detection | 8-15ms | OpenCV Haar Cascade |
-| Facial Emotion | 80-120ms | ViT-Face-Expression |
-| Voice Emotion | 40-60ms | HuBERT per 3s chunk |
-| Whisper Transcription | 370ms - 1.04s | Length-dependent |
-| Text Sentiment | 90-110ms | DistilRoBERTa |
-| Fusion Calculation | <5ms | Weighted average |
-| LLM Generation | 1-2s | Groq Cloud API |
-| XTTS Synthesis | 2-4s | Coqui XTTS v2 |
-| **Total Response Time** | **1.5-2.5s** | Target achieved |
-### Accuracy Metrics
-| Modality | Accuracy | Dataset/Notes |
-|----------|----------|---------------|
-| Face Only | 70-75% | ViT on FER2013 |
-| Voice Only | 76.8% | HuBERT on IEMOCAP |
-| Text Only | 81.2% | DistilRoBERTa + rules |
-| **Multi-Modal Fusion** | **85-88%** | Estimated combined accuracy |
----
-## Installation
-### Prerequisites
-- Python 3.11+
-- Node.js 20+
-- NVIDIA GPU with 4GB+ VRAM (recommended)
-- CUDA 11.8+ (for GPU acceleration)
-- Git LFS
-### Local Development
-```bash
-# Clone repository
-git clone https://github.com/YourUsername/MrrrMe.git
-cd MrrrMe
-git lfs install
-git lfs pull
-# Backend setup
-python -m venv venv
-source venv/bin/activate  # or venv\Scripts\activate on Windows
-pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-pip install -r requirements_docker.txt
-# Create .env file
-echo "GROQ_API_KEY=your_api_key_here" > .env
-# Frontend setup
-cd avatar-frontend
-npm install
-npm run build
-cd ..
-# Start services (3 terminals needed)
-# Terminal 1:
-cd avatar && python speak_server.py
-# Terminal 2:
-python mrrrme/backend_new.py
-# Terminal 3:
-cd avatar-frontend && npm run dev
-```
-Access at `http://localhost:3000`
-### Docker Deployment
-```bash
-# Build image
-docker build -t mrrrme:latest .
-# Run with GPU
-docker run --gpus all -p 7860:7860 mrrrme:latest
-# Run CPU only
-docker run -p 7860:7860 mrrrme:latest
-```
----
-## Project Structure
-```
-MrrrMe/
-├── avatar-frontend/              # Next.js web application
-│   ├── app/                      # Next.js app router
-│   ├── public/                   # Static assets
-│   └── package.json
-├── mrrrme/                       # Python backend
-│   ├── backend/                  # Modular FastAPI backend
-│   │   ├── auth/                 # Authentication
-│   │   ├── models/               # AI model loading
-│   │   ├── processing/           # Core processing
-│   │   └── session/              # Session management
-│   ├── audio/                    # Audio processing
-│   ├── nlp/                      # NLP modules
-│   ├── vision/                   # Computer vision
-│   └── config.py                 # Global configuration
-├── avatar/                       # Avatar TTS backend
-├── model/                        # Neural network architectures
-├── weights/                      # Model weights (LFS)
-├── Dockerfile                    # Container definition
-└── requirements_docker.txt       # Python dependencies
-```
-See individual folder READMEs for detailed documentation of each component.
----
-## Configuration
-### Emotion Fusion Weights
-```python
-# mrrrme/config.py or mrrrme/backend/config.py
-FUSION_WEIGHTS = {
-    'face': 0.40,   # Facial expressions
-    'voice': 0.30,  # Vocal prosody
-    'text': 0.30    # Linguistic sentiment
-}
-```
-### LLM Settings
-```python
-LLM_RESPONSE_STYLE = "balanced"  # Options: brief, balanced, detailed
-PERSONALITY = "therapist"         # Options: therapist, coach
-```
-### Supported Languages
-Primary: English (en), Dutch (nl)
-TTS Supported (16 total): en, nl, fr, de, it, es, ja, zh, pt, pl, tr, ru, cs, ar, hu, ko
 ---
-## Development Timeline
-### Weeks 1-7 (Completed)
-- Multi-modal emotion detection pipeline
-- Web frontend with 3D avatar system
-- Real-time WebSocket communication
-- User authentication and session management
-- Groq API and XTTS v2 integration
-### Weeks 8-18 (Planned)
-- **8-9**: Testing, optimization, bug fixes
-- **10-12**: Avatar enhancement and animation refinement
-- **13-15**: UI/UX improvements and feature expansion
-- **16**: Extended memory and context management
-- **17**: User testing and feedback integration
-- **18**: Demo preparation and final documentation
 ---
-## API Reference
-### WebSocket Events
-**Client to Server**:
-- `auth`: Session authentication
-- `video_frame`: Base64 encoded video frame
-- `audio_chunk`: Base64 encoded audio data
-- `speech_end`: Transcribed speech text
-- `preferences`: Voice, language, personality settings
-**Server to Client**:
-- `face_emotion`: Detected facial emotion with probabilities
-- `voice_emotion`: Detected voice emotion
-- `llm_response`: AI-generated response with audio and visemes
----
-## Team
-**Musaed Al-Fareh** - AI & Data Science Student
-Email: 225739@buas.nl
-LinkedIn: [linkedin.com/in/musaed-alfareh-a365521b9](https://www.linkedin.com/in/musaed-alfareh-a365521b9/)
-**Michon Goddijn** - AI & Data Science Student
-Email: 231849@buas.nl
-**Lorena Kraljić** - Tourism Student
-Email: 226142@buas.nl
----
-## License
-MIT License
-Component licenses: ViT-Face-Expression (MIT), Whisper (MIT), HuBERT (MIT), Llama 3.1 (Llama 2 Community License), Coqui XTTS v2 (MPL 2.0)
----
-## Contact
-**Repository**: [GitHub - MrrrMe](https://github.com/YourUsername/MrrrMe)
-**Live Demo**: [Hugging Face Spaces](https://huggingface.co/spaces/michon/mrrrme-emotion-ai)
-**Email**: 225739@buas.nl
----
-**Last Updated**: December 9, 2024
-**Version**: 2.0.0
-**Status**: Active Development (Week 7/18)

 ---
+title: Mrrrme Emotion Ai
+emoji: 🌍
+colorFrom: indigo
+colorTo: purple
+sdk: docker
+pinned: false
+license: mit
+short_description: MrrrMe
 ---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+"# Test by [friend name]"
+"# Test by [Michon]"