hf-test commited on
Commit
974dfd3
1 Parent(s): 488c40e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -10
README.md CHANGED
@@ -99,7 +99,7 @@ The following hyperparameters were used during training:
99
  - Datasets 1.17.1.dev0
100
  - Tokenizers 0.10.3
101
 
102
- ### Inference Without Decoder
103
 
104
  ```python
105
  import torch
@@ -108,9 +108,11 @@ from transformers import AutoModelForCTC, AutoProcessor
108
  import torchaudio.functional as F
109
 
110
 
111
- model_id = "patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm"
112
 
113
- sample = next(iter(load_dataset("common_voice", "es", split="test", streaming=True)))
 
 
114
  resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
115
 
116
  model = AutoModelForCTC.from_pretrained(model_id)
@@ -121,15 +123,9 @@ input_values = processor(resampled_audio, return_tensors="pt").input_values
121
  with torch.no_grad():
122
  logits = model(input_values).logits
123
 
124
- -prediction_ids = torch.argmax(logits, dim=-1)
125
- -transcription = processor.batch_decode(prediction_ids)
126
- +transcription = processor.batch_decode(logits.numpy()).text
127
  ```
128
 
129
-
130
- ### Inference With Decoder
131
-
132
-
133
  ### Eval results on Common Voice 7 "test":
134
 
135
  **Without LM**: 27.30 WER
 
99
  - Datasets 1.17.1.dev0
100
  - Tokenizers 0.10.3
101
 
102
+ ### Inference With LM
103
 
104
  ```python
105
  import torch
 
108
  import torchaudio.functional as F
109
 
110
 
111
+ model_id = "hf-test/xls-r-300m-sv"
112
 
113
+ sample_iter = iter(load_dataset("mozilla-foundation/common_voice_7_0", "sv-SE", split="test", streaming=True, use_auth_token=True))
114
+
115
+ sample = next(sample_iter)
116
  resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
117
 
118
  model = AutoModelForCTC.from_pretrained(model_id)
 
123
  with torch.no_grad():
124
  logits = model(input_values).logits
125
 
126
+ transcription = processor.batch_decode(logits.numpy()).text
 
 
127
  ```
128
 
 
 
 
 
129
  ### Eval results on Common Voice 7 "test":
130
 
131
  **Without LM**: 27.30 WER