Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
5.12.0
metadata
title: Vocal Emotion Recognition
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
pinned: false
Vocal Emotion Recognition System
π― Project Overview
A deep learning-based system for real-time emotion recognition from vocal input using state-of-the-art audio processing and transformer models.
Key Features
- Real-time vocal emotion analysis
- Advanced audio feature extraction
- Pre-trained transformer model integration
- User-friendly web interface
- Comprehensive evaluation metrics
π οΈ Technical Architecture
Components
Audio Processing Pipeline
- Sample rate standardization (16kHz)
- Noise reduction and normalization
- Feature extraction (MFCC, Chroma, Mel spectrograms)
Machine Learning Pipeline
- DistilBERT-based emotion classification
- Transfer learning capabilities
- Comprehensive evaluation metrics
Web Interface
- Gradio-based interactive UI
- Real-time processing
- Intuitive result visualization
π¦ Installation
- Clone the Repository
git clone [repository-url]
cd vocal-emotion-recognition
- Install Dependencies
pip install -r requirements.txt
- Environment Setup
- Python 3.8+ required
- CUDA-compatible GPU recommended for training
- Microphone access required for real-time analysis
π Usage
Starting the Application
python app.py
- Access the web interface at
http://localhost:7860
- Use microphone input for real-time analysis
- View emotion classification results instantly
Training Custom Models
python model_training.py --data_path [path] --epochs [num]
π Model Performance
The system utilizes various metrics for evaluation:
- Accuracy, Precision, Recall, F1 Score
- ROC-AUC Score
- Confusion Matrix
- MAE and RMSE
π§ Configuration
Model Settings
- Base model:
bhadresh-savani/distilbert-base-uncased-emotion
- Audio sample rate: 16kHz
- Batch size: 8 (configurable)
- Learning rate: 5e-5
Feature Extraction
- MFCC: 13 coefficients
- Chroma features
- Mel spectrograms
- Spectral contrast
- Tonnetz features
π API Reference
Audio Processing
preprocess_audio(audio_file)
extract_features(audio_data)
Model Interface
analyze_emotion(audio_input)
train_model(data_path, epochs)
π€ Contributing
- Fork the repository
- Create a feature branch
- Commit changes
- Push to the branch
- Open a pull request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- HuggingFace Transformers
- Librosa Audio Processing
- Gradio Interface Library
π Contact
For questions and support, please open an issue in the repository.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference