|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
# Emotion Detection From Speech |
|
|
|
This model is the fine-tuned version of **DistilHuBERT** which classifies emotions from audio inputs. |
|
|
|
## Approach |
|
1. **Dataset:** The **Ravdess** dataset, comprising 1,440 audio files with 8 emotion labels: calm, happy, sad, angry, fearful, surprise, neutral, and disgust. |
|
2. **Model Fine-Tuning:** The DistilHuBERT model was fine-tuned for 7 epochs with a learning rate of 5e-5, achieving an accuracy of 98% on the test dataset. |
|
|
|
## Data Preprocessing |
|
- **Sampling Rate**: Audio files were resampled to 16kHz to match the model's requirements. |
|
- **Padding:** Audio clips shorter than 30 seconds were zero-padded. |
|
- **Train-Test Split:** 80% of the samples were used for training, and 20% for testing. |
|
|
|
## Model Architecture |
|
- **DistilHuBERT:** A lightweight variant of HuBERT, fine-tuned for emotion classification. |
|
- **Fine-Tuning Setup:** |
|
- Optimizer: AdamW |
|
- Loss Function: Cross-Entropy |
|
- Learning Rate: 5e-5 |
|
- Warm-up Ratio: 0.1 |
|
- Epochs: 7 |
|
|
|
## Results |
|
- **Accuracy:** 0.98 on the test dataset |
|
- **Loss:** 0.10 on the test dataset |
|
|
|
## Usage |
|
from transformers import pipeline |
|
|
|
pipe = pipeline( |
|
"audio-classification", |
|
model="BilalHasan/distilhubert-finetuned-ravdess", |
|
) |
|
|
|
emotion = pipe(path_to_your_audio) |
|
|
|
## Demo |
|
You can access the live demo of the app on [Hugging Face Spaces](https://huggingface.co/spaces/BilalHasan/Mood-Based-Yoga-Session-Recommendation). |