Emotion Classification Model

This model is a 8-class SVM classifier trained on the RAVDESS dataset using SpeechBrain ECAPA-TDNN embeddings as features.

Model Details

  • Input: Audio file (will be converted to 16kHz, mono, single channel)
  • Output: Predicted emotion (8 classes) [angry, disgust, fearful, happy, neutral, sad, surprised, other]
  • Features:
    • SpeechBrain ECAPA-TDNN embedding [192 features]
  • Performance:
    • RAVDESS 5-fold cross-validation: 84% accuracy

Installation

You can install the package directly from GitHub:

pip install git+https://github.com/griko/voice-emotion-classification.git

Usage

from pipelines.emotion_classifier import EmotionClassificationPipeline

# Load the model
classifier = EmotionClassificationPipeline.from_pretrained("griko/emotion_8_cls_svm_ecapa_ravdess")

# Use it for prediction
result = classifier("path/to/audio.wav")
print(result) # ['angry'] or ['disgust'] or ['fearful'] or ['happy'] or ['neutral'] or ['calm'] or ['sad'] or ['surprised']

# Batch prediction
results = classifier(["audio1.wav", "audio2.wav"])
print(results) # ['angry', 'disgust']

Input Requirements

  • Audio files should be in WAV format
  • Audio will be automatically resampled to 16kHz if needed
  • Audio will be converted to mono if needed

Limitations

  • Model was trained on actor voices from RAVDESS dataset
  • Performance may vary on different audio qualities or recording conditions

Citation

If you use this model in your research, please cite:

@misc{koushnir2025vanpyvoiceanalysisframework,
      title={VANPY: Voice Analysis Framework}, 
      author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
      year={2025},
      eprint={2502.17579},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2502.17579}, 
}
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support