qinyue commited on
Commit
3e34d55
1 Parent(s): 14de29e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -1
README.md CHANGED
@@ -28,9 +28,37 @@ model-index:
28
 
29
  # Wav2Vec2-Large-XLSR-53-Chinese-zh-CN-aishell1
30
 
31
- Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Chinese using the [AISHELL-1](https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell).
32
  When using this model, make sure that your speech input is sampled at 16kHz.
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ## Evaluation
35
 
36
  ```python
 
28
 
29
  # Wav2Vec2-Large-XLSR-53-Chinese-zh-CN-aishell1
30
 
31
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Chinese using the [AISHELL-1](https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell) dataset.
32
  When using this model, make sure that your speech input is sampled at 16kHz.
33
 
34
+ ## Usage
35
+
36
+ The model can be used directly (without a language model) as follows:
37
+
38
+ ```python
39
+ import torch
40
+ import librosa
41
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
42
+
43
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
44
+
45
+ processor = Wav2Vec2Processor.from_pretrained(
46
+ 'qinyue/wav2vec2-large-xlsr-53-chinese-zn-cn-aishell1')
47
+ model = Wav2Vec2ForCTC.from_pretrained(
48
+ 'qinyue/wav2vec2-large-xlsr-53-chinese-zn-cn-aishell1').to(device)
49
+
50
+ filepath = 'test.wav'
51
+ audio, sr = librosa.load(filepath, sr=16000, mono=True)
52
+ inputs = processor(audio, sample_rate=16000, return_tensors="pt").to(device)
53
+ with torch.no_grad():
54
+ logits = model(inputs.input_values,
55
+ attention_mask=inputs.attention_mask).logits
56
+ predicted_ids = torch.argmax(logits, dim=-1)
57
+ pred_str = processor.decode(predicted_ids[0])
58
+
59
+ print(pred_str)
60
+ ```
61
+
62
  ## Evaluation
63
 
64
  ```python