3loi commited on
Commit
08a900d
1 Parent(s): 86bdebf

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ pipeline_tag: audio-classification
6
+ tags:
7
+ - wavlm
8
+ - msp-podcast
9
+ - emotion-recognition
10
+ - audio
11
+ - speech
12
+ - valence
13
+ - lucas
14
+ - speech-emotion-recognition
15
+ ---
16
+ The model was trained on [MSP-Podcast](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html) for the Odyssey 2024 Emotion Recognition competition baseline<br>
17
+ This particular model is the single-task specialized valence model, which predict valence in a range of approximately 0...1.
18
+
19
+
20
+
21
+ # Benchmarks
22
+ CCC based on Test3 and Development sets of the Odyssey Competition
23
+ <table style="width:500px">
24
+ <tr><th colspan=6 align="center" >Sinle-Task Setup</th></tr>
25
+ <tr><th colspan=3 align="center">Test 3</th><th colspan=3 align="center">Development</th></tr>
26
+ <tr> <td>Val</td> <<td>Val</td> </tr>
27
+ <tr> <td> 0.607</td> <td>0.709</td> </tr>
28
+ </table>
29
+
30
+
31
+
32
+ For more details: [demo](https://huggingface.co/spaces/3loi/WavLM-SER-Multi-Baseline-Odyssey2024), [paper/soon]() and [GitHub](https://github.com/MSP-UTD/MSP-Podcast_Challenge/tree/main).
33
+
34
+
35
+ ```
36
+ @InProceedings{Goncalves_2024,
37
+ author={L. Goncalves and A. N. Salman and A. {Reddy Naini} and L. Moro-Velazquez and T. Thebaud and L. {Paola Garcia} and N. Dehak and B. Sisman and C. Busso},
38
+ title={Odyssey2024 - Speech Emotion Recognition Challenge: Dataset, Baseline Framework, and Results},
39
+ booktitle={Odyssey 2024: The Speaker and Language Recognition Workshop)},
40
+ volume={To appear},
41
+ year={2024},
42
+ month={June},
43
+ address = {Quebec, Canada},
44
+ }
45
+ ```
46
+
47
+
48
+ # Usage
49
+ ```python
50
+ from transformers import AutoModelForAudioClassification
51
+ import librosa, torch
52
+
53
+ #load model
54
+ model = AutoModelForAudioClassification.from_pretrained("3loi/SER-Odyssey-Baseline-WavLM-Valence", trust_remote_code=True)
55
+
56
+ #get mean/std
57
+ mean = model.config.mean
58
+ std = model.config.std
59
+
60
+
61
+ #load an audio file
62
+ audio_path = "/path/to/audio.wav"
63
+ raw_wav, _ = librosa.load(audio_path, sr=model.config.sampling_rate)
64
+
65
+ #normalize the audio by mean/std
66
+ norm_wav = (raw_wav - mean) / (std+0.000001)
67
+
68
+ #generate the mask
69
+ mask = torch.ones(1, len(norm_wav))
70
+
71
+ #batch it (add dim)
72
+ wavs = torch.tensor(norm_wav).unsqueeze(0)
73
+
74
+
75
+ #predict
76
+ with torch.no_grad():
77
+ pred = model(wavs, mask)
78
+
79
+ print(model.config.id2label)
80
+ print(pred)
81
+ #{0: 'valence'}
82
+ #tensor([[0.3670]])
83
+ ```