hyyoka commited on
Commit
9c00dc1
1 Parent(s): 066d5b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -1
README.md CHANGED
@@ -17,4 +17,39 @@ Futher fine-tuned [fleek/wav2vec-large-xlsr-korean](https://huggingface.co/fleek
17
 
18
  When using this model, make sure that your speech input is sampled at 16kHz.
19
 
20
- The script used for training can be found here: https://github.com/hyyoka/wav2vec2-korean-senior
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  When using this model, make sure that your speech input is sampled at 16kHz.
19
 
20
+ The script used for training can be found here: https://github.com/hyyoka/wav2vec2-korean-senior
21
+
22
+
23
+ ### Inference
24
+
25
+ ``` py
26
+ import torchaudio
27
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
28
+ import re
29
+
30
+ def clean_up(transcription):
31
+ hangul = re.compile('[^ ㄱ-ㅣ가-힣]+')
32
+ result = hangul.sub('', transcription)
33
+ return result
34
+
35
+ model_name "hyyoka/wav2vec2-xlsr-korean-senior"
36
+ processor = Wav2Vec2Processor.from_pretrained(model_name)
37
+ model = Wav2Vec2ForCTC.from_pretrained(model_name)
38
+ speech_array, sampling_rate = torchaudio.load(wav_file)
39
+ feat = processor(speech_array[0],
40
+ sampling_rate=16000,
41
+ padding=True,
42
+ max_length=800000,
43
+ truncation=True,
44
+ return_attention_mask=True,
45
+ return_tensors="pt",
46
+ pad_token_id=49
47
+ )
48
+ input = {'input_values': feat['input_values'],'attention_mask':feat['attention_mask']}
49
+
50
+ outputs = model(**input, output_attentions=True)
51
+ logits = outputs.logits
52
+ predicted_ids = logits.argmax(axis=-1)
53
+ transcription = processor.decode(predicted_ids[0])
54
+ stt_result = clean_up(transcription)
55
+ ```