i need to verify this pretrained model on another datasets xith multiple wave file how to arrange the code
#7
by
messaoudi
- opened
README.md
CHANGED
@@ -15,20 +15,8 @@ pipeline_tag: audio-classification
|
|
15 |
|
16 |
# Model for Dimensional Speech Emotion Recognition based on Wav2vec 2.0
|
17 |
|
18 |
-
|
19 |
-
|
20 |
-
that has been trained on much more data
|
21 |
-
can be acquired with [audEERING](https://www.audeering.com/products/devaice/).
|
22 |
-
The model expects a raw audio signal as input,
|
23 |
-
and outputs predictions for arousal, dominance and valence in a range of approximately 0...1.
|
24 |
-
In addition,
|
25 |
-
it provides the pooled states of the last transformer layer.
|
26 |
-
The model was created by fine-tuning
|
27 |
-
[Wav2Vec2-Large-Robust](https://huggingface.co/facebook/wav2vec2-large-robust)
|
28 |
-
on [MSP-Podcast](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html) (v1.7).
|
29 |
-
The model was pruned from 24 to 12 transformer layers before fine-tuning.
|
30 |
-
An [ONNX](https://onnx.ai/) export of the model is available from [doi:10.5281/zenodo.6221127](https://zenodo.org/record/6221127).
|
31 |
-
Further details are given in the associated [paper](https://arxiv.org/abs/2203.07378) and [tutorial](https://github.com/audeering/w2v2-how-to).
|
32 |
|
33 |
# Usage
|
34 |
|
@@ -96,7 +84,7 @@ class EmotionModel(Wav2Vec2PreTrainedModel):
|
|
96 |
device = 'cpu'
|
97 |
model_name = 'audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim'
|
98 |
processor = Wav2Vec2Processor.from_pretrained(model_name)
|
99 |
-
model = EmotionModel.from_pretrained(model_name)
|
100 |
|
101 |
# dummy signal
|
102 |
sampling_rate = 16000
|
|
|
15 |
|
16 |
# Model for Dimensional Speech Emotion Recognition based on Wav2vec 2.0
|
17 |
|
18 |
+
The model expects a raw audio signal as input and outputs predictions for arousal, dominance and valence in a range of approximately 0...1. In addition, it also provides the pooled states of the last transformer layer. The model was created by fine-tuning [
|
19 |
+
Wav2Vec2-Large-Robust](https://huggingface.co/facebook/wav2vec2-large-robust) on [MSP-Podcast](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html) (v1.7). The model was pruned from 24 to 12 transformer layers before fine-tuning. An [ONNX](https://onnx.ai/") export of the model is available from [doi:10.5281/zenodo.6221127](https://zenodo.org/record/6221127). Further details are given in the associated [paper](https://arxiv.org/abs/2203.07378) and [tutorial](https://github.com/audeering/w2v2-how-to).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
# Usage
|
22 |
|
|
|
84 |
device = 'cpu'
|
85 |
model_name = 'audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim'
|
86 |
processor = Wav2Vec2Processor.from_pretrained(model_name)
|
87 |
+
model = EmotionModel.from_pretrained(model_name)
|
88 |
|
89 |
# dummy signal
|
90 |
sampling_rate = 16000
|