gymeee commited on
Commit
eb4c285
1 Parent(s): d212694
Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - ASCEND
5
+ language:
6
+ - zh
7
+ metrics:
8
+ - cer
9
+ tags:
10
+ - audio
11
+ - automatic-speech-recognition
12
+ - speech
13
+ - xlsr-fine-tuning-week
14
+ ---
15
+
16
+
17
+ ## inference
18
+
19
+ The model can be used directly (without a language model) as follows...
20
+
21
+ Using the [HuggingSound](https://github.com/jonatasgrosman/huggingsound) library:
22
+
23
+ ```python
24
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
25
+ from datasets import load_dataset
26
+ import torch
27
+ import torchaudio
28
+
29
+
30
+
31
+ # load model and processor
32
+ processor = Wav2Vec2Processor.from_pretrained("gymeee/demo_code_switching")
33
+ model = Wav2Vec2ForCTC.from_pretrained("gymeee/demo_code_switching")
34
+
35
+ # load speech
36
+ speech_array, sampling_rate = torchaudio.load("speech.wav")
37
+ # tokenize
38
+ input_values = processor(speech_array[0], return_tensors="pt", padding="longest").input_values # Batch size 1
39
+
40
+ # retrieve logits
41
+ logits = model(input_values).logits
42
+
43
+ # take argmax and decode
44
+ predicted_ids = torch.argmax(logits, dim=-1)
45
+ transcription = processor.batch_decode(predicted_ids)
46
+
47
+ print(transcription)