Edit model card

classla/wav2vec2-large-slavic-voxpopuli-v2_hr_SER

This model for Croatian SER (speech emotion recognition) is based on the facebook/wav2vec2-large-slavic-voxpopuli-v2 and was fine-tuned on the CrES 2.1 dataset (Croatian Emotional Speech corpus).

If you use this model, please cite the following paper describing the dataset:

 @inproceedings{Dropuljić_Chmura_Kolak_Petrinović_2011, title={Emotional speech corpus of Croatian language}, ISSN={1845-5921}, booktitle={2011 7th International Symposium on Image and Signal Processing and Analysis (ISPA)}, author={Dropuljić, Branimir and Chmura, Miłosz Thomasz and Kolak, Antonio and Petrinović, Davor}, year={2011}, month={Sep}, pages={95–100} }

Metrics

Evaluation is performed on the dev and test portions of the CrES 2.1 dataset. The splitting was performed anew, stratified on emotion and with no leakage (i.e. no speaker is present in more than one split).

accuracy macro F1 split
0.6796 0.6461 test
0.7277 0.7232 dev

Confusion matrix on test:

Training hyperparameters

In fine-tuning, the following arguments were used:

arg value
per_device_train_batch_size 2
per_device_eval_batch_size 2
gradient_accumulation_steps 2
num_train_epochs 20
learning_rate 1e-4
Downloads last month
14