Iskaj commited on
Commit
c0844d5
1 Parent(s): 2069130

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -1
README.md CHANGED
@@ -1 +1,89 @@
1
- Actually trained on CV 7.0, whoops...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - sv-SE
4
+ license: apache-2.0
5
+ tags:
6
+ - automatic-speech-recognition
7
+ - mozilla-foundation/common_voice_7_0
8
+ - generated_from_trainer
9
+ - nl
10
+ - robust-speech-event
11
+ - model_for_talk
12
+ datasets:
13
+ - mozilla-foundation/common_voice_7_0
14
+ model-index:
15
+ - name: XLS-R-300M - Dutch
16
+ results:
17
+ - task:
18
+ name: Automatic Speech Recognition
19
+ type: automatic-speech-recognition
20
+ dataset:
21
+ name: Common Voice 7
22
+ type: mozilla-foundation/common_voice_7_0
23
+ args: nl
24
+ metrics:
25
+ - name: Test WER
26
+ type: wer
27
+ value: ???
28
+ - name: Test CER
29
+ type: cer
30
+ value: ???
31
+ - task:
32
+ name: Automatic Speech Recognition
33
+ type: automatic-speech-recognition
34
+ dataset:
35
+ name: Robust Speech Event - Dev Data
36
+ type: speech-recognition-community-v2/dev_data
37
+ args: sv
38
+ metrics:
39
+ - name: Test WER
40
+ type: wer
41
+ value: ???
42
+ - name: Test CER
43
+ type: cer
44
+ value: ???
45
+ ---
46
+
47
+ # xlsr300m_cv_8.0_nl
48
+
49
+
50
+ #### Evaluation Commands
51
+ 1. To evaluate on `mozilla-foundation/common_voice_7_0` with split `test`
52
+
53
+ ```bash
54
+ python eval.py --model_id Iskaj/xlsr300m_cv_8.0_nl --dataset mozilla-foundation/common_voice_8_0 --config nl --split test
55
+ ```
56
+
57
+ 2. To evaluate on `speech-recognition-community-v2/dev_data`
58
+
59
+ ```bash
60
+ python eval.py --model_id Iskaj/xlsr300m_cv_8.0_nl --dataset speech-recognition-community-v2/dev_data --config nl --split validation --chunk_length_s 5.0 --stride_length_s 1.0
61
+ ```
62
+
63
+ ### Inference
64
+
65
+ ```python
66
+ import torch
67
+ from datasets import load_dataset
68
+ from transformers import AutoModelForCTC, AutoProcessor
69
+ import torchaudio.functional as F
70
+
71
+ model_id = "Iskaj/xlsr300m_cv_8.0_nl"
72
+
73
+ sample_iter = iter(load_dataset("mozilla-foundation/common_voice_8_0", "nl", split="test", streaming=True, use_auth_token=True))
74
+
75
+ sample = next(sample_iter)
76
+ resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
77
+
78
+ model = AutoModelForCTC.from_pretrained(model_id)
79
+ processor = AutoProcessor.from_pretrained(model_id)
80
+
81
+ inputs = processor(resampled_audio, sampling_rate=16_000, return_tensors="pt")
82
+ with torch.no_grad():
83
+ logits = model(**inputs).logits
84
+ predicted_ids = torch.argmax(logits, dim=-1)
85
+ transcription = processor.batch_decode(predicted_ids)
86
+
87
+ transcription[0].lower()
88
+ #'het kontine schip lag aangemeert in de aven'
89
+ ```