juliensimon
/

wav2vec2-conformer-rel-pos-large-finetuned-speech-commands

Audio Classification

wav2vec2-conformer

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

Julien Simon commited on Jun 25, 2022

Commit

5df9582

·

1 Parent(s): e6b2d99

Update README.md

Files changed (1) hide show

README.md +15 -2

README.md CHANGED Viewed

@@ -39,15 +39,28 @@ TBD
 The model can spot one of the following keywords: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go", "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Bed", "Bird", "Cat", "Dog", "Happy", "House", "Marvin", "Sheila", "Tree", "Wow", "Backward", "Forward", "Follow", "Learn", "Visual".
 ### Training and evaluation data
-- subset v0.02
 - full training set
 - full validation set
 ### Training procedure
-TBD
 #### Training hyperparameters

 The model can spot one of the following keywords: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go", "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Bed", "Bird", "Cat", "Dog", "Happy", "House", "Marvin", "Sheila", "Tree", "Wow", "Backward", "Forward", "Follow", "Learn", "Visual".
+The repository includes sample files that I recorded (WAV, 16Khz sampling rate, mono). The simplest way to use the model is with the ```pipeline``` API:
+```
+>>> from transformers import pipeline
+>>> p = pipeline("audio-classification", model="juliensimon/wav2vec2-conformer-rel-pos-large-finetuned-speech-commands")
+>>> p("up16k.wav")
+[{'score': 0.7008192539215088, 'label': 'up'}, {'score': 0.04346614331007004, 'label': 'off'}, {'score': 0.029526518657803535, 'label': 'left'}, {'score': 0.02905120886862278, 'label': 'stop'}, {'score': 0.027142534032464027, 'label': 'on'}]
+>>> p("stop16k.wav")
+[{'score': 0.6969656944274902, 'label': 'stop'}, {'score': 0.03391443192958832, 'label': 'up'}, {'score': 0.027382319793105125, 'label': 'seven'}, {'score': 0.020835857838392258, 'label': 'five'}, {'score': 0.018051736056804657, 'label': 'down'}]
+>>> p("marvin16k.wav")
+[{'score': 0.5276530981063843, 'label': 'marvin'}, {'score': 0.04645705968141556, 'label': 'down'}, {'score': 0.038583893328905106, 'label': 'backward'}, {'score': 0.03578080236911774, 'label': 'wow'}, {'score': 0.03178196772933006, 'label': 'bird'}]
+```
 ### Training and evaluation data
+- subset: v0.02
 - full training set
 - full validation set
 ### Training procedure
+The model was fine-tuned on [Amazon SageMaker](https://aws.amazon.com/sagemaker), using an [ml.p3dn.24xlarge](https://aws.amazon.com/fr/ec2/instance-types/p3/) instance (8 NVIDIA V100 GPUs). Total training time for 10 epochs was 4.5 hours.
 #### Training hyperparameters