juliensimon
/

wav2vec2-conformer-rel-pos-large-finetuned-speech-commands

Audio Classification

wav2vec2-conformer

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

Julien Simon commited on Jun 27, 2022

Commit

66ede47

•

1 Parent(s): f86622c

Update README.md

Files changed (1) hide show

README.md +31 -0

README.md CHANGED Viewed

@@ -50,6 +50,37 @@ The repository includes sample files that I recorded (WAV, 16Khz sampling rate,
 [{'score': 0.5276530981063843, 'label': 'marvin'}, {'score': 0.04645705968141556, 'label': 'down'}, {'score': 0.038583893328905106, 'label': 'backward'}, {'score': 0.03578080236911774, 'label': 'wow'}, {'score': 0.03178196772933006, 'label': 'bird'}]
 ```
 ### Training and evaluation data
 - subset: v0.02

 [{'score': 0.5276530981063843, 'label': 'marvin'}, {'score': 0.04645705968141556, 'label': 'down'}, {'score': 0.038583893328905106, 'label': 'backward'}, {'score': 0.03578080236911774, 'label': 'wow'}, {'score': 0.03178196772933006, 'label': 'bird'}]
 ```
+You can also use with the ```Auto```API:
+```
+>>> import torch, librosa
+>>> from transformers import AutoModelForAudioClassification, Wav2Vec2FeatureExtractor
+>>> feature_extractor = Wav2Vec2FeatureExtractor(feature_size=1, sampling_rate=16000, padding_value=0.0, do_normalize=True, return_attention_mask=False)
+>>> model = AutoModelForAudioClassification.from_pretrained("juliensimon/wav2vec2-conformer-rel-pos-large-finetuned-speech-commands")
+>>> audio, rate = librosa.load("up16k.wav", sr = 16000)
+>>> inputs = feature_extractor(audio, sampling_rate=16000, return_tensors = "pt")
+>>> logits = model(inputs['input_values'])
+>>> logits
+SequenceClassifierOutput(loss=None, logits=tensor([[-0.4635, -1.0112,  4.7935,  0.8528,  1.6265,  0.6456,  1.5423,  2.0132,
+          1.6103,  0.5847, -2.2526,  0.8839,  0.8163, -1.5655, -1.4160, -0.4196,
+         -0.1097, -1.8827,  0.6609, -0.2022,  0.0971, -0.6205,  0.4492,  0.0926,
+         -2.4848,  0.2630, -0.4584, -2.4327, -1.1654,  0.3897, -0.3374, -1.2418,
+         -0.1045,  0.2827, -1.5667, -0.0963]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
+>>> classes = torch.softmax(logits.logits, dim = -1)
+>>> classes
+tensor([[3.6522e-03, 2.1118e-03, 7.0082e-01, 1.3621e-02, 2.9527e-02, 1.1071e-02,
+         2.7143e-02, 4.3466e-02, 2.9051e-02, 1.0417e-02, 6.1027e-04, 1.4051e-02,
+         1.3132e-02, 1.2132e-03, 1.4089e-03, 3.8160e-03, 5.2022e-03, 8.8345e-04,
+         1.1242e-02, 4.7424e-03, 6.3974e-03, 3.1215e-03, 9.0975e-03, 6.3689e-03,
+         4.8384e-04, 7.5519e-03, 3.6707e-03, 5.0970e-04, 1.8101e-03, 8.5720e-03,
+         4.1427e-03, 1.6769e-03, 5.2292e-03, 7.7021e-03, 1.2117e-03, 5.2723e-03]],
+       grad_fn=<SoftmaxBackward0>)
+>>> top_class = torch.argmax(logits.logits, dim = -1)
+>>> top_class = top_class.detach().numpy()[0]
+>>> model.config.id2label[top_class]
+'up'
+```
 ### Training and evaluation data
 - subset: v0.02