BilalHasan
/

distilhubert-finetuned-ravdess

Audio Classification

Model card Files Files and versions Community

distilhubert-finetuned-ravdess / README.md

BilalHasan's picture

Create README.md

1fb0cc2 verified 11 days ago

|

1.47 kB

	---
	license: apache-2.0
	language:
	- en
	---
	# Emotion Detection From Speech

	This model is the fine-tuned version of DistilHuBERT which classifies emotions from audio inputs.

	## Approach
	1. Dataset: The Ravdess dataset, comprising 1,440 audio files with 8 emotion labels: calm, happy, sad, angry, fearful, surprise, neutral, and disgust.
	2. Model Fine-Tuning: The DistilHuBERT model was fine-tuned for 7 epochs with a learning rate of 5e-5, achieving an accuracy of 98% on the test dataset.

	## Data Preprocessing
	- Sampling Rate: Audio files were resampled to 16kHz to match the model's requirements.
	- Padding: Audio clips shorter than 30 seconds were zero-padded.
	- Train-Test Split: 80% of the samples were used for training, and 20% for testing.

	## Model Architecture
	- DistilHuBERT: A lightweight variant of HuBERT, fine-tuned for emotion classification.
	- Fine-Tuning Setup:
	- Optimizer: AdamW
	- Loss Function: Cross-Entropy
	- Learning Rate: 5e-5
	- Warm-up Ratio: 0.1
	- Epochs: 7

	## Results
	- Accuracy: 0.98 on the test dataset
	- Loss: 0.10 on the test dataset

	## Usage
	from transformers import pipeline

	pipe = pipeline(
	"audio-classification",
	model="BilalHasan/distilhubert-finetuned-ravdess",
	)

	emotion = pipe(path_to_your_audio)

	## Demo
	You can access the live demo of the app on [Hugging Face Spaces](https://huggingface.co/spaces/BilalHasan/Mood-Based-Yoga-Session-Recommendation).