BilalHasan's picture
Create README.md
1fb0cc2 verified
|
raw
history blame
1.47 kB
---
license: apache-2.0
language:
- en
---
# Emotion Detection From Speech
This model is the fine-tuned version of **DistilHuBERT** which classifies emotions from audio inputs.
## Approach
1. **Dataset:** The **Ravdess** dataset, comprising 1,440 audio files with 8 emotion labels: calm, happy, sad, angry, fearful, surprise, neutral, and disgust.
2. **Model Fine-Tuning:** The DistilHuBERT model was fine-tuned for 7 epochs with a learning rate of 5e-5, achieving an accuracy of 98% on the test dataset.
## Data Preprocessing
- **Sampling Rate**: Audio files were resampled to 16kHz to match the model's requirements.
- **Padding:** Audio clips shorter than 30 seconds were zero-padded.
- **Train-Test Split:** 80% of the samples were used for training, and 20% for testing.
## Model Architecture
- **DistilHuBERT:** A lightweight variant of HuBERT, fine-tuned for emotion classification.
- **Fine-Tuning Setup:**
- Optimizer: AdamW
- Loss Function: Cross-Entropy
- Learning Rate: 5e-5
- Warm-up Ratio: 0.1
- Epochs: 7
## Results
- **Accuracy:** 0.98 on the test dataset
- **Loss:** 0.10 on the test dataset
## Usage
from transformers import pipeline
pipe = pipeline(
"audio-classification",
model="BilalHasan/distilhubert-finetuned-ravdess",
)
emotion = pipe(path_to_your_audio)
## Demo
You can access the live demo of the app on [Hugging Face Spaces](https://huggingface.co/spaces/BilalHasan/Mood-Based-Yoga-Session-Recommendation).