alefiury commited on
Commit
6906af3
1 Parent(s): cd9c831

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: pt
3
+ datasets:
4
+ - coraa_ser
5
+ - emovo
6
+ - ravdess
7
+ - baved
8
+ metrics:
9
+ - f1
10
+ tags:
11
+ - audio
12
+ - speech
13
+ - wav2vec2
14
+ - pt
15
+ - portuguese-speech-corpus
16
+ - italian-speech-corpus
17
+ - english-speech-corpus
18
+ - arabic-speech-corpus
19
+ - spontaneous
20
+ - speech
21
+ - PyTorch
22
+ license: apache-2.0
23
+ model_index:
24
+ name: wav2vec2-xls-r-300m-pt-br-spontaneous-speech-emotion-recognition
25
+ ---
26
+
27
+ # Wav2vec 2.0 XLS-R For Spontaneous Speech Recognition
28
+
29
+ This is the model that got first place in the SER track of the Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022) Workshop.
30
+
31
+ The following datasets were used in the training:
32
+
33
+ - [CORAA SER v1.0](https://github.com/rmarcacini/ser-coraa-pt-br/): a dataset composed of spontaneous portuguese speech and approximately 50 minutes of audio segments labeled in three classes: neutral, non-neutral female, and non-neutral male.
34
+
35
+ - [EMOVO Corpus](https://aclanthology.org/L14-1478/): a database of emotional speech for the Italian language, built from the voices of up to 6 actors who played 14 sentences simulating 6 emotional states (disgust, fear, anger, joy, surprise, sadness) plus the neutral state.
36
+
37
+ - [RAVDESS]((https://zenodo.org/record/1188976#.YO6yI-gzaUk)): a dataset that provides 1440 samples of recordings from actors performing on 8 different emotions in English, which are: angry, calm, disgust, fearful, happy, neutral, sad and surprised.
38
+
39
+ - [BAVED](https://github.com/40uf411/Basic-Arabic-Vocal-Emotions-Dataset): a collection of audio recordings of Arabic words spoken with varying degrees of emotion. The dataset contains seven words: like, unlike, this, file, good, neutral, and bad, which are spoken at three emotional levels: low emotion (tired or feeling down), neutral emotion (the way the speaker speaks daily), and high emotion (positive or negative emotions such as happiness, joy, sadness, anger).
40
+
41
+ The test set used is a part of the CORAA SER v1.0 that has been set aside for this purpose.
42
+
43
+ It achieves the following results on the test set:
44
+ - Accuracy: 0.9090
45
+ - Macro Precision: 0.8171
46
+ - Macro Recall: 0.8397
47
+ - Macro F1-Score: 0.8187
48
+
49
+ ## Datasets Details
50
+
51
+ The following image shows the overall distribution of the datasets:
52
+
53
+ ![distribution](https://docs.google.com/spreadsheets/d/e/2PACX-1vTUvuMLRnoFv3MBkStOcMQE5GuiqqyrvpyEtIiwoQEg8uA6dWvfZM-faHORLFNmPYJUzDbO6TZ2a9Zb/pubchart?oid=446282973&format=image)
54
+
55
+ The following image shows the number of instances by label:
56
+
57
+ ![numberInstances](https://docs.google.com/spreadsheets/d/e/2PACX-1vS7PUbW6J3Hnof1D2l492KW0sbF4BzWCeaiGQm53w-9EZck_Y14feE48HtcBvmjjZKsTJWP1RZpdh_v/pubchart?oid=1904097403&format=image)
58
+
59
+
60
+ ## Repository
61
+
62
+ The repository that implements the model to be trained and tested is avaible [here](https://github.com/alefiury/SE-R-2022-SER-Track).