hyyoka
/

wav2vec2-xlsr-korean-senior

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

hyyoka commited on Jan 28, 2022

Commit

9c00dc1

•

1 Parent(s): 066d5b1

Update README.md

Files changed (1) hide show

README.md +36 -1

README.md CHANGED Viewed

@@ -17,4 +17,39 @@ Futher fine-tuned [fleek/wav2vec-large-xlsr-korean](https://huggingface.co/fleek
 When using this model, make sure that your speech input is sampled at 16kHz.
-The script used for training can be found here: https://github.com/hyyoka/wav2vec2-korean-senior

 When using this model, make sure that your speech input is sampled at 16kHz.
+The script used for training can be found here: https://github.com/hyyoka/wav2vec2-korean-senior
+### Inference
+``` py
+import torchaudio
+from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
+import re
+def clean_up(transcription):
+    hangul = re.compile('[^ ㄱ-ㅣ가-힣]+')
+    result = hangul.sub('', transcription)
+    return result
+model_name "hyyoka/wav2vec2-xlsr-korean-senior"
+processor = Wav2Vec2Processor.from_pretrained(model_name)
+model = Wav2Vec2ForCTC.from_pretrained(model_name)
+speech_array, sampling_rate = torchaudio.load(wav_file)
+feat = processor(speech_array[0],
+                            sampling_rate=16000,
+                            padding=True,
+                            max_length=800000,
+                            truncation=True,
+                            return_attention_mask=True,
+                            return_tensors="pt",
+                            pad_token_id=49
+                            )
+input = {'input_values': feat['input_values'],'attention_mask':feat['attention_mask']}
+outputs = model(**input, output_attentions=True)
+logits = outputs.logits
+predicted_ids = logits.argmax(axis=-1)
+transcription = processor.decode(predicted_ids[0])
+stt_result = clean_up(transcription)
+```