--- datasets: - narad/ravdess language: - en metrics: - f1 - accuracy - recall - precision pipeline_tag: audio-classification --- # Emotion Recognition in English Using RAVDESS and Wav2Vec 2.0 This model extracts emotions from audio recordings. It was trained on RAVDESS, a dataset containing English audio recordings. The model recognises six emotions: anger, disgust, fear, happiness, sadness and surprise. The model recreates the work of this [Greek emotion extractor](https://huggingface.co/m3hrdadfi/wav2vec2-xlsr-greek-speech-emotion-recognition/blob/main/README.md) using a pre-trained [Wav2Vec2](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english) model to process the data. ## Model Details ### Model Description - **Adapted from:** [Emotion Recognition in Greek](https://huggingface.co/m3hrdadfi/wav2vec2-xlsr-greek-speech-emotion-recognition/blob/main/README.md) - **Model type:** NN with CTC - **Language(s) (NLP):** English - **Finetuned from model:** [wav2vec2](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english) ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data The RAVDESS dataset was split into training, validation and test sets with 60, 20 and 20 splits, respectively. ### Training Procedure The fine-tuning process was centred on four hyper-parameters: - the number of batches (4, 8), - gradient accumulation steps (GAS) (2, 4, 6, 8), - number of epochs (10, 20) and - the learning rate (1e-3, 1e-4, 1e-5). Each experiment was repeated 10 times. ## Evaluation The set of hyper-parameters resulting in the best performance is: 4 batches, 4 GAS, 10 epochs and 1e-4 learning rate ## Testing The model was retrained on the combined train and validation sets using the best hyper-parameter set. The performance on the test set has an average Accuracy and F1 scores of 84.84% (SD 2 and 2.08, respectively) ## Results We retained the model providing the highest performance over the 10 runs. | Emotion | Accuracy | Precision | Recall | F1 | |-----------|:-------:|-----------:|---------:|---------:| | Anger | | 96.55 | 87.50 | | | Disgust | | 90.91 | 93.75 | | | Fear | | 96.30 | 81.25 | | | Happiness | | 93.10 | 84.38 | | | Sad | | 81.58 | 96.88 | | | Surprise | | 77.78 | 87.50 | | | Total | 88.54 | 89.37 | 88.54 | 88.62 |