--- license: apache-2.0 language: - en base_model: - ntu-spml/distilhubert pipeline_tag: audio-classification library_name: flair --- # Emotion Detection From Speech This model is the fine-tuned version of **DistilHuBERT** which classifies emotions from audio inputs. ## Approach 1. **Dataset:** The **Ravdess** dataset, comprising 1,440 audio files with 8 emotion labels: calm, happy, sad, angry, fearful, surprise, neutral, and disgust. 2. **Model Fine-Tuning:** The DistilHuBERT model was fine-tuned for 7 epochs with a learning rate of 5e-5, achieving an accuracy of 98% on the test dataset. ## Data Preprocessing - **Sampling Rate**: Audio files were resampled to 16kHz to match the model's requirements. - **Padding:** Audio clips shorter than 30 seconds were zero-padded. - **Train-Test Split:** 80% of the samples were used for training, and 20% for testing. ## Model Architecture - **DistilHuBERT:** A lightweight variant of HuBERT, fine-tuned for emotion classification. - **Fine-Tuning Setup:** - Optimizer: AdamW - Loss Function: Cross-Entropy - Learning Rate: 5e-5 - Warm-up Ratio: 0.1 - Epochs: 7 ## Results - **Accuracy:** 0.98 on the test dataset - **Loss:** 0.10 on the test dataset ## Usage from transformers import pipeline pipe = pipeline( "audio-classification", model="BilalHasan/distilhubert-finetuned-ravdess", ) emotion = pipe(path_to_your_audio) ## Demo You can access the live demo of the app on [Hugging Face Spaces](https://huggingface.co/spaces/BilalHasan/Mood-Based-Yoga-Session-Recommendation).