BilalHasan
/

distilhubert-finetuned-ravdess

Audio Classification

Model card Files Files and versions Community

distilhubert-finetuned-ravdess / README.md

BilalHasan's picture

Update README.md

d89a5b6 verified 15 days ago

|

1.57 kB

metadata

license: apache-2.0
language:
  - en
base_model:
  - ntu-spml/distilhubert
pipeline_tag: audio-classification
library_name: flair

Emotion Detection From Speech

This model is the fine-tuned version of DistilHuBERT which classifies emotions from audio inputs.

Approach

Dataset: The Ravdess dataset, comprising 1,440 audio files with 8 emotion labels: calm, happy, sad, angry, fearful, surprise, neutral, and disgust.
Model Fine-Tuning: The DistilHuBERT model was fine-tuned for 7 epochs with a learning rate of 5e-5, achieving an accuracy of 98% on the test dataset.

Data Preprocessing

Sampling Rate: Audio files were resampled to 16kHz to match the model's requirements.
Padding: Audio clips shorter than 30 seconds were zero-padded.
Train-Test Split: 80% of the samples were used for training, and 20% for testing.

Model Architecture

DistilHuBERT: A lightweight variant of HuBERT, fine-tuned for emotion classification.
Fine-Tuning Setup:
- Optimizer: AdamW
- Loss Function: Cross-Entropy
- Learning Rate: 5e-5
- Warm-up Ratio: 0.1
- Epochs: 7

Results

Accuracy: 0.98 on the test dataset
Loss: 0.10 on the test dataset

Usage

from transformers import pipeline

pipe = pipeline( "audio-classification", model="BilalHasan/distilhubert-finetuned-ravdess", )

emotion = pipe(path_to_your_audio)

Demo

You can access the live demo of the app on Hugging Face Spaces.