r-f commited on
Commit
20217d5
1 Parent(s): e3f6b70

Create new file

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - generated_from_trainer
5
+ metrics:
6
+ - accuracy
7
+ model_index:
8
+ name: wav2vec2-lg-xlsr-en-speech-emotion-recognition
9
+ ---
10
+ # Speech Emotion Recognition By Fine-Tuning Wav2Vec 2.0
11
+ The model is a fine-tuned version of [jonatasgrosman/wav2vec2-large-xlsr-53-english](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english) for a Speech Emotion Recognition (SER) task.]
12
+
13
+ Several datasets were used the fine-tune the original model:
14
+ Surrey Audio-Visual Expressed Emotion (SAVEE) (http://kahlan.eps.surrey.ac.uk/savee/Database.html)
15
+ - 480 audio files from 4 male actors
16
+
17
+ Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) (https://zenodo.org/record/1188976#.YO6yI-gzaUk)
18
+ - 1440 audio files from 24 professional actors (12 female, 12 male)
19
+
20
+ Toronto emotional speech set (TESS) (https://tspace.library.utoronto.ca/handle/1807/24487)
21
+ - 2800 audio files from 2 female actors
22
+
23
+ 7 classifcation labels
24
+ ```python
25
+ emotions = ['angry' 'disgust' 'fear' 'happy' 'neutral' 'sad' 'surprise']
26
+ ```
27
+ It achieves the following results on the evaluation set:
28
+ - Loss: 0.5023
29
+ - Accuracy: 0.8223
30
+ ## Model description
31
+ More information needed
32
+ ## Intended uses & limitations
33
+ More information needed
34
+ ## Training and evaluation data
35
+ More information needed
36
+ ## Training procedure
37
+ ### Training hyperparameters
38
+ The following hyperparameters were used during training:
39
+ - learning_rate: 0.0001
40
+ - train_batch_size: 4
41
+ - eval_batch_size: 4
42
+ - seed: 42
43
+ - gradient_accumulation_steps: 2
44
+ - total_train_batch_size: 8
45
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
+ - lr_scheduler_type: linear
47
+ - num_epochs: 3
48
+ - mixed_precision_training: Native AMP
49
+
50
+ ### Training results
51
+
52
+ Step Training Loss Validation Loss Accuracy
53
+ 500 1.812400 1.365212 0.486258
54
+ 1000 0.887200 0.773145 0.797040
55
+ 1500 0.703500 0.574954 0.852008
56
+ 2000 0.687900 1.286738 0.775899
57
+ 2500 0.649800 0.697455 0.832981
58
+ 3000 0.569600 0.337240 0.892178
59
+ 3500 0.421800 0.307072 0.911205
60
+ 4000 0.308800 0.374443 0.930233
61
+ 4500 0.268800 0.260444 0.936575
62
+ 5000 0.297300 0.302985 0.923890
63
+ 5500 0.176500 0.165439 0.961945
64
+ 6000 0.147500 0.170199 0.961945
65
+ 6500 0.127400 0.155310 0.966173
66
+ 7000 0.069900 0.103882 0.976744
67
+ 7500 0.083000 0.104075 0.974630