Emotion Classification Model
This model is a 8-class SVM classifier trained on the RAVDESS dataset using SpeechBrain ECAPA-TDNN embeddings as features.
Model Details
- Input: Audio file (will be converted to 16kHz, mono, single channel)
- Output: Predicted emotion (8 classes) [angry, disgust, fearful, happy, neutral, sad, surprised, other]
- Features:
- SpeechBrain ECAPA-TDNN embedding [192 features]
- Performance:
- RAVDESS 5-fold cross-validation: 84% accuracy
Installation
You can install the package directly from GitHub:
pip install git+https://github.com/griko/voice-emotion-classification.git
Usage
from pipelines.emotion_classifier import EmotionClassificationPipeline
# Load the model
classifier = EmotionClassificationPipeline.from_pretrained("griko/emotion_8_cls_svm_ecapa_ravdess")
# Use it for prediction
result = classifier("path/to/audio.wav")
print(result) # ['angry'] or ['disgust'] or ['fearful'] or ['happy'] or ['neutral'] or ['calm'] or ['sad'] or ['surprised']
# Batch prediction
results = classifier(["audio1.wav", "audio2.wav"])
print(results) # ['angry', 'disgust']
Input Requirements
- Audio files should be in WAV format
- Audio will be automatically resampled to 16kHz if needed
- Audio will be converted to mono if needed
Limitations
- Model was trained on actor voices from RAVDESS dataset
- Performance may vary on different audio qualities or recording conditions
Citation
If you use this model in your research, please cite:
@misc{koushnir2025vanpyvoiceanalysisframework,
title={VANPY: Voice Analysis Framework},
author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
year={2025},
eprint={2502.17579},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2502.17579},
}
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.