Emotion Classification Model

This model is a 8-class SVM classifier trained on the RAVDESS dataset using SpeechBrain ECAPA-TDNN embeddings as features.

Model Details

Input: Audio file (will be converted to 16kHz, mono, single channel)
Output: Predicted emotion (8 classes) [angry, disgust, fearful, happy, neutral, sad, surprised, other]
Features:
- SpeechBrain ECAPA-TDNN embedding [192 features]
Performance:
- RAVDESS 5-fold cross-validation: 84% accuracy

Installation

You can install the package directly from GitHub:

pip install git+https://github.com/griko/voice-emotion-classification.git

Usage

from pipelines.emotion_classifier import EmotionClassificationPipeline

# Load the model
classifier = EmotionClassificationPipeline.from_pretrained("griko/emotion_8_cls_svm_ecapa_ravdess")

# Use it for prediction
result = classifier("path/to/audio.wav")
print(result) # ['angry'] or ['disgust'] or ['fearful'] or ['happy'] or ['neutral'] or ['calm'] or ['sad'] or ['surprised']

# Batch prediction
results = classifier(["audio1.wav", "audio2.wav"])
print(results) # ['angry', 'disgust']

Input Requirements

Audio files should be in WAV format
Audio will be automatically resampled to 16kHz if needed
Audio will be converted to mono if needed

Limitations

Model was trained on actor voices from RAVDESS dataset
Performance may vary on different audio qualities or recording conditions

Citation

If you use this model in your research, please cite:

@misc{koushnir2025vanpyvoiceanalysisframework,
      title={VANPY: Voice Analysis Framework}, 
      author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
      year={2025},
      eprint={2502.17579},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2502.17579}, 
}