3loi commited on
Commit
e73f3cc
1 Parent(s): bf5f4fc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ pipeline_tag: audio-classification
6
+ tags:
7
+ - wavlm
8
+ - msp-podcast
9
+ - emotion-recognition
10
+ - audio
11
+ - speech
12
+ - categorical
13
+ - lucas
14
+ ---
15
+ The model was trained on [MSP-Podcast](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html) for the Odyssey 2024 Emotion Recognition competition baseline<br>
16
+ This particular model is the categorical based model which predict "Angry", "Sad", "Happy", "Surprise", "Fear", "Disgust", "Contempt" and "Neutral".
17
+
18
+
19
+ # Benchmarks
20
+ CCC based on test3 and Development sets of the Odyssey Competition
21
+ <table style="width:500px">
22
+ <tr><th colspan=6 align="center" >Categorical Setup</th></tr>
23
+ <tr><th colspan=3 align="center">Test 3</th><th colspan=3 align="center">Development</th></tr>
24
+ <tr> <td>F1-Mic.</td> <td>F1-Ma.</td> <td>Prec.</td> <td>Rec.</td> <td>F1-Mic.</td> <td>F1-Ma.</td> <td>Prec.</td> <td>Rec.</td> </tr>
25
+ <tr> <td> 0.327</td> <td>0.311</td> <td>0.332</td> <td>0.325</td> <td>0.409</td> <td>0.307</td> <td>0.316</td> <td>0.345</td> </tr>
26
+ </table>
27
+
28
+
29
+
30
+ For more details: [demo](https://huggingface.co/spaces/3loi/WavLM-SER-Multi-Baseline-Odyssey2024), [paper/soon]() and [GitHub](https://github.com/MSP-UTD/MSP-Podcast_Challenge/tree/main).
31
+
32
+
33
+ ```
34
+ @InProceedings{Goncalves_2024,
35
+ author={L. Goncalves and A. N. Salman and A. {Reddy Naini} and L. Moro-Velazquez and T. Thebaud and L. {Paola Garcia} and N. Dehak and B. Sisman and C. Busso},
36
+ title={Odyssey2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results},
37
+ booktitle={Odyssey 2024: The Speaker and Language Recognition Workshop)},
38
+ volume={To appear},
39
+ year={2024},
40
+ month={June},
41
+ address = {Quebec, Canada},
42
+ }
43
+ ```
44
+
45
+
46
+ # Usage
47
+ ```python
48
+ from transformers import AutoModelForAudioClassification
49
+ import librosa, torch
50
+
51
+ #load model
52
+ model = AutoModelForAudioClassification.from_pretrained("3loi/SER-Odyssey-Baseline-WavLM-Categorical-Attributes", trust_remote_code=True)
53
+
54
+ #get mean/std
55
+ mean = model.config.mean
56
+ std = model.config.std
57
+
58
+
59
+ #load an audio file
60
+ audio_path = "/path/to/audio.wav"
61
+ raw_wav, _ = librosa.load(audio_path, sr=model.config.sampling_rate)
62
+
63
+ #normalize the audio by mean/std
64
+ norm_wav = (raw_wav - mean) / (std+0.000001)
65
+
66
+ #generate the mask
67
+ mask = torch.ones(1, len(norm_wav))
68
+
69
+ #batch it (add dim)
70
+ wavs = torch.tensor(norm_wav).unsqueeze(0)
71
+
72
+
73
+ #predict
74
+ with torch.no_grad():
75
+ pred = model(wavs, mask)
76
+
77
+ print(model.config.id2label)
78
+ print(pred)
79
+ #{0: 'Angry', 1: 'Sad', 2: 'Happy', 3: 'Surprise', 4: 'Fear', 5: 'Disgust', 6: 'Contempt', 7: 'Neutral'}
80
+ #tensor([[0.0015, 0.3651, 0.0593, 0.0315, 0.0600, 0.0125, 0.0319, 0.4382]])
81
+
82
+ #convert logits to probability
83
+ probabilities = torch.nn.functional.softmax(pred, dim=1)
84
+ print(probabilities)
85
+ #[[0.0015, 0.3651, 0.0593, 0.0315, 0.0600, 0.0125, 0.0319, 0.4382]]
86
+ ```