invincible-jha's picture
Upload readme.md
122d335 verified

A newer version of the Gradio SDK is available: 5.12.0

Upgrade
metadata
title: Vocal Emotion Recognition
emoji: 🎀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
pinned: false

Vocal Emotion Recognition System

🎯 Project Overview

A deep learning-based system for real-time emotion recognition from vocal input using state-of-the-art audio processing and transformer models.

Key Features

  • Real-time vocal emotion analysis
  • Advanced audio feature extraction
  • Pre-trained transformer model integration
  • User-friendly web interface
  • Comprehensive evaluation metrics

πŸ› οΈ Technical Architecture

Components

  1. Audio Processing Pipeline

    • Sample rate standardization (16kHz)
    • Noise reduction and normalization
    • Feature extraction (MFCC, Chroma, Mel spectrograms)
  2. Machine Learning Pipeline

    • DistilBERT-based emotion classification
    • Transfer learning capabilities
    • Comprehensive evaluation metrics
  3. Web Interface

    • Gradio-based interactive UI
    • Real-time processing
    • Intuitive result visualization

πŸ“¦ Installation

  1. Clone the Repository
git clone [repository-url]
cd vocal-emotion-recognition
  1. Install Dependencies
pip install -r requirements.txt
  1. Environment Setup
  • Python 3.8+ required
  • CUDA-compatible GPU recommended for training
  • Microphone access required for real-time analysis

πŸš€ Usage

Starting the Application

python app.py
  • Access the web interface at http://localhost:7860
  • Use microphone input for real-time analysis
  • View emotion classification results instantly

Training Custom Models

python model_training.py --data_path [path] --epochs [num]

πŸ“Š Model Performance

The system utilizes various metrics for evaluation:

  • Accuracy, Precision, Recall, F1 Score
  • ROC-AUC Score
  • Confusion Matrix
  • MAE and RMSE

πŸ”§ Configuration

Model Settings

  • Base model: bhadresh-savani/distilbert-base-uncased-emotion
  • Audio sample rate: 16kHz
  • Batch size: 8 (configurable)
  • Learning rate: 5e-5

Feature Extraction

  • MFCC: 13 coefficients
  • Chroma features
  • Mel spectrograms
  • Spectral contrast
  • Tonnetz features

πŸ“ API Reference

Audio Processing

preprocess_audio(audio_file)
extract_features(audio_data)

Model Interface

analyze_emotion(audio_input)
train_model(data_path, epochs)

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit changes
  4. Push to the branch
  5. Open a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • HuggingFace Transformers
  • Librosa Audio Processing
  • Gradio Interface Library

πŸ“ž Contact

For questions and support, please open an issue in the repository.

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference