license: apache-2.0
tags:
- generated_from_trainer
metrics:
- accuracy
model_index:
name: wav2vec2-lg-xlsr-en-speech-emotion-recognition
Speech Emotion Recognition By Fine-Tuning Wav2Vec 2.0
The model is a fine-tuned version of jonatasgrosman/wav2vec2-large-xlsr-53-english for a Speech Emotion Recognition (SER) task.]
Several datasets were used the fine-tune the original model:
Surrey Audio-Visual Expressed Emotion (SAVEE) (http://kahlan.eps.surrey.ac.uk/savee/Database.html)
- 480 audio files from 4 male actors
Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) (https://zenodo.org/record/1188976#.YO6yI-gzaUk)
- 1440 audio files from 24 professional actors (12 female, 12 male)
Toronto emotional speech set (TESS) (https://tspace.library.utoronto.ca/handle/1807/24487)
- 2800 audio files from 2 female actors
7 classifcation labels
emotions = ['angry' 'disgust' 'fear' 'happy' 'neutral' 'sad' 'surprise']
It achieves the following results on the evaluation set:
- Loss: 0.5023
- Accuracy: 0.8223
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
- mixed_precision_training: Native AMP
Training results
Step Training Loss Validation Loss Accuracy 500 1.812400 1.365212 0.486258 1000 0.887200 0.773145 0.797040 1500 0.703500 0.574954 0.852008 2000 0.687900 1.286738 0.775899 2500 0.649800 0.697455 0.832981 3000 0.569600 0.337240 0.892178 3500 0.421800 0.307072 0.911205 4000 0.308800 0.374443 0.930233 4500 0.268800 0.260444 0.936575 5000 0.297300 0.302985 0.923890 5500 0.176500 0.165439 0.961945 6000 0.147500 0.170199 0.961945 6500 0.127400 0.155310 0.966173 7000 0.069900 0.103882 0.976744 7500 0.083000 0.104075 0.974630