qinyue
/

wav2vec2-large-xlsr-53-chinese-zn-cn-aishell1

Automatic Speech Recognition

xlsr-fine-tuning-week

Inference Endpoints

Model card Files Files and versions Community

qinyue commited on Jun 17, 2022

Commit

3e34d55

•

1 Parent(s): 14de29e

Update README.md

Files changed (1) hide show

README.md +29 -1

README.md CHANGED Viewed

@@ -28,9 +28,37 @@ model-index:
 # Wav2Vec2-Large-XLSR-53-Chinese-zh-CN-aishell1
-Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Chinese using the [AISHELL-1](https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell).
 When using this model, make sure that your speech input is sampled at 16kHz.
 ## Evaluation
 ```python

 # Wav2Vec2-Large-XLSR-53-Chinese-zh-CN-aishell1
+Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Chinese using the [AISHELL-1](https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell) dataset.
 When using this model, make sure that your speech input is sampled at 16kHz.
+## Usage
+The model can be used directly (without a language model) as follows:
+```python
+import torch
+import librosa
+from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+processor = Wav2Vec2Processor.from_pretrained(
+    'qinyue/wav2vec2-large-xlsr-53-chinese-zn-cn-aishell1')
+model = Wav2Vec2ForCTC.from_pretrained(
+    'qinyue/wav2vec2-large-xlsr-53-chinese-zn-cn-aishell1').to(device)
+filepath = 'test.wav'
+audio, sr = librosa.load(filepath, sr=16000, mono=True)
+inputs = processor(audio, sample_rate=16000, return_tensors="pt").to(device)
+with torch.no_grad():
+    logits = model(inputs.input_values,
+                   attention_mask=inputs.attention_mask).logits
+predicted_ids = torch.argmax(logits, dim=-1)
+pred_str = processor.decode(predicted_ids[0])
+print(pred_str)
+```
 ## Evaluation
 ```python