update description and move files
Browse files- README.md +51 -0
- test_results.csv → _test/test_results.csv +0 -0
- config.yaml → conf/config.yaml +0 -0
README.md
CHANGED
|
@@ -28,3 +28,54 @@ model-index:
|
|
| 28 |
name: Unweighted Average Recall
|
| 29 |
value: 0.6499883154795764
|
| 30 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
name: Unweighted Average Recall
|
| 29 |
value: 0.6499883154795764
|
| 30 |
---
|
| 31 |
+
|
| 32 |
+
# Speech Emotion Recognition Model
|
| 33 |
+
|
| 34 |
+
`Wav2Vec2-Large-Robust` model fine-tuned on the [MSP-Podcast](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html)
|
| 35 |
+
(v1.11) dataset for classifying emotions into four categories: _Anger (A)_, _Happiness (H)_, _Neutral (N)_, and _Sadness (S)_.
|
| 36 |
+
|
| 37 |
+
## Installation
|
| 38 |
+
|
| 39 |
+
To use the model, install autrainer, e.g., via pip:
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
pip install autrainer
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
## Usage
|
| 46 |
+
|
| 47 |
+
The model can be applied to all audio files in a folder (`<data-root>`) and stores the predictions in another folder (`<output-root>`):
|
| 48 |
+
|
| 49 |
+
```bash
|
| 50 |
+
autrainer inference hf:autrainer/msp-podcast-emo-class-big4-w2v2-l-emo <data-root> <output-root>
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
## Training
|
| 54 |
+
|
| 55 |
+
### Pretraining
|
| 56 |
+
|
| 57 |
+
The model has been originally trained on the MSP-Podcast (v1.7) dataset by [audEERING](https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim) to predict three emotional dimensions: _arousal_, _dominance_, and _valence_.
|
| 58 |
+
|
| 59 |
+
### Dataset
|
| 60 |
+
|
| 61 |
+
The model was further fine-tuned on the MSP-Podcast (v1.11) dataset, a large corpus of spontaneous emotional speech collected from various podcast recordings.
|
| 62 |
+
The dataset includes natural emotional expressions which cover a broad range of speakers, recording conditions, and conversation topics.
|
| 63 |
+
|
| 64 |
+
**Note:** The MSP-Podcast dataset is not yet included in the autrainer 0.5.0 release but can be found in [this Pull Request](https://github.com/autrainer/autrainer/pull/46).
|
| 65 |
+
|
| 66 |
+
### Training Process
|
| 67 |
+
|
| 68 |
+
The model has been fine-tuned for 5 epochs.
|
| 69 |
+
At the end of each epoch, the model was evaluated on the validation set.
|
| 70 |
+
We release the state that achieved the best performance on this validation set.
|
| 71 |
+
All training hyperparameters can be found in the main configuration file (`conf/config.yaml`).
|
| 72 |
+
|
| 73 |
+
### Evaluation
|
| 74 |
+
|
| 75 |
+
We evaluate the model on the `Test1` split of the MSP-Podcast dataset.
|
| 76 |
+
The model achieves a classification accuracy of 0.617 on the test set.
|
| 77 |
+
|
| 78 |
+
## Acknowledgements
|
| 79 |
+
|
| 80 |
+
Please acknowledge the work which produced the original model and the MSP-Podcast dataset.
|
| 81 |
+
We would also appreciate an acknowledgment to autrainer.
|
test_results.csv → _test/test_results.csv
RENAMED
|
File without changes
|
config.yaml → conf/config.yaml
RENAMED
|
File without changes
|