AreejB's picture
Update README.md
0786134 verified
---
datasets:
- narad/ravdess
language:
- en
metrics:
- f1
- accuracy
- recall
- precision
pipeline_tag: audio-classification
---
# Emotion Recognition in English Using RAVDESS and Wav2Vec 2.0
<!-- Provide a quick summary of what the model is/does. -->
This model extracts emotions from audio recordings. It was trained on RAVDESS, a dataset containing English audio recordings. The model recognises six emotions: anger, disgust, fear, happiness, sadness and surprise.
The model recreates the work of this [Greek emotion extractor](https://huggingface.co/m3hrdadfi/wav2vec2-xlsr-greek-speech-emotion-recognition/blob/main/README.md) using a pre-trained [Wav2Vec2](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english) model to process the data.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Adapted from:** [Emotion Recognition in Greek](https://huggingface.co/m3hrdadfi/wav2vec2-xlsr-greek-speech-emotion-recognition/blob/main/README.md)
- **Model type:** NN with CTC
- **Language(s) (NLP):** English
- **Finetuned from model:** [wav2vec2](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english)
## How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
The RAVDESS dataset was split into training, validation and test sets with 60, 20 and 20 splits, respectively.
### Training Procedure
The fine-tuning process was centred on four hyper-parameters:
- the number of batches (4, 8),
- gradient accumulation steps (GAS) (2, 4, 6, 8),
- number of epochs (10, 20) and
- the learning rate (1e-3, 1e-4, 1e-5).
Each experiment was repeated 10 times.
## Evaluation
The set of hyper-parameters resulting in the best performance is: 4 batches, 4 GAS, 10 epochs and 1e-4 learning rate
## Testing
The model was retrained on the combined train and validation sets using the best hyper-parameter set. The performance on the test set has an average Accuracy and F1 scores of 84.84% (SD 2 and 2.08, respectively)
## Results
We retained the model providing the highest performance over the 10 runs.
| Emotion | Accuracy | Precision | Recall | F1 |
|-----------|:-------:|-----------:|---------:|---------:|
| Anger | | 96.55 | 87.50 | |
| Disgust | | 90.91 | 93.75 | |
| Fear | | 96.30 | 81.25 | |
| Happiness | | 93.10 | 84.38 | |
| Sad | | 81.58 | 96.88 | |
| Surprise | | 77.78 | 87.50 | |
| Total | 88.54 | 89.37 | 88.54 | 88.62 |
<!-- ## Citation [optional] -->
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
<!-- **BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed] -->