“Alok” commited on
Commit
c349099
1 Parent(s): 3e90250
README.md CHANGED
@@ -1,8 +1,6 @@
1
  language: sw
2
  datasets:
3
- - OpenSLR - http://www.openslr.org/25/
4
- - TODO: add more datasets if you have used additional datasets. Make sure to use the exact same
5
- dataset name as the one found [here](https://huggingface.co/datasets). If the dataset can not be found in the official datasets, just give it a new name
6
  metrics:
7
  - wer
8
  tags:
@@ -12,24 +10,23 @@ tags:
12
  - xlsr-fine-tuning-week
13
  license: apache-2.0
14
  model-index:
15
- - name: Swahili XLSR53 Wav2Vec2 Large
16
  results:
17
  - task:
18
  name: Speech Recognition
19
  type: automatic-speech-recognition
20
  dataset:
21
- name: Common Voice {lang_id} #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
22
- type: common_voice
23
- args: {lang_id} #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
24
  metrics:
25
  - name: Test WER
26
  type: wer
27
- value: {wer_result_on_test} #TODO (IMPORTANT): replace {wer_result_on_test} with the WER error rate you achieved on the common_voice test set. It should be in the format XX.XX (don't add the % sign here). **Please** remember to fill out this value after you evaluated your model, so that your model appears on the leaderboard. If you fill out this model card before evaluating your model, please remember to edit the model card afterward to fill in your value
28
  ---
29
 
30
- # Wav2Vec2-Large-XLSR-53-{language} #TODO: replace language with your {language}, *e.g.* French
31
 
32
- Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on {language} using the [Common Voice](https://huggingface.co/datasets/common_voice), ... and ... dataset{s}. #TODO: replace {language} with your language, *e.g.* French and eventually add more datasets that were used and eventually remove common voice if model was not trained on common voice
33
  When using this model, make sure that your speech input is sampled at 16kHz.
34
 
35
  ## Usage
@@ -42,87 +39,39 @@ import torchaudio
42
  from datasets import load_dataset
43
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
44
 
45
- test_dataset = load_dataset("common_voice", "{lang_id}", split="test[:2%]") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
46
 
47
- processor = Wav2Vec2Processor.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
48
- model = Wav2Vec2ForCTC.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
49
 
50
- resampler = torchaudio.transforms.Resample(48_000, 16_000)
51
-
52
- # Preprocessing the datasets.
53
- # We need to read the aduio files as arrays
54
- def speech_file_to_array_fn(batch):
55
- speech_array, sampling_rate = torchaudio.load(batch["path"])
56
- batch["speech"] = resampler(speech_array).squeeze().numpy()
57
- return batch
58
-
59
- test_dataset = test_dataset.map(speech_file_to_array_fn)
60
- inputs = processor(test_dataset["speech"][:2], sampling_rate=16_000, return_tensors="pt", padding=True)
61
-
62
- with torch.no_grad():
63
- logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
64
-
65
- predicted_ids = torch.argmax(logits, dim=-1)
66
-
67
- print("Prediction:", processor.batch_decode(predicted_ids))
68
- print("Reference:", test_dataset["sentence"][:2])
69
- ```
70
-
71
-
72
- ## Evaluation
73
-
74
- The model can be evaluated as follows on the {language} test data of Common Voice. # TODO: replace #TODO: replace language with your {language}, *e.g.* French
75
 
76
-
77
- ```python
78
- import torch
79
- import torchaudio
80
- from datasets import load_dataset, load_metric
81
- from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
82
- import re
83
-
84
- test_dataset = load_dataset("common_voice", "{lang_id}", split="test") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
85
- wer = load_metric("wer")
86
-
87
- processor = Wav2Vec2Processor.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
88
- model = Wav2Vec2ForCTC.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
89
- model.to("cuda")
90
-
91
- chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]' # TODO: adapt this list to include all special characters you removed from the data
92
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
93
 
94
- # Preprocessing the datasets.
95
- # We need to read the aduio files as arrays
96
- def speech_file_to_array_fn(batch):
97
- batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
98
- speech_array, sampling_rate = torchaudio.load(batch["path"])
99
- batch["speech"] = resampler(speech_array).squeeze().numpy()
100
- return batch
101
-
102
- test_dataset = test_dataset.map(speech_file_to_array_fn)
103
-
104
- # Preprocessing the datasets.
105
- # We need to read the aduio files as arrays
106
- def evaluate(batch):
107
- inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
108
 
109
- with torch.no_grad():
110
- logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits
 
 
 
 
111
 
112
- pred_ids = torch.argmax(logits, dim=-1)
113
- batch["pred_strings"] = processor.batch_decode(pred_ids)
114
- return batch
115
 
116
- result = test_dataset.map(evaluate, batched=True, batch_size=8)
 
 
 
 
 
 
 
117
 
118
- print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
119
  ```
120
 
121
- **Test Result**: XX.XX % # TODO: write output of print here. IMPORTANT: Please remember to also replace {wer_result_on_test} at the top of with this value here. tags.
122
 
123
 
124
  ## Training
125
 
126
- The Common Voice `train`, `validation`, and ... datasets were used for training as well as ... and ... # TODO: adapt to state all the datasets that were used for training.
127
 
128
- The script used for training can be found [here](...) # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.
1
  language: sw
2
  datasets:
3
+ - ALFFA (African Languages in the Field: speech Fundamentals and Automation) - [here](http://www.openslr.org/25/)
 
 
4
  metrics:
5
  - wer
6
  tags:
10
  - xlsr-fine-tuning-week
11
  license: apache-2.0
12
  model-index:
13
+ - name: Swahili XLSR-53 Wav2Vec2.0 Large
14
  results:
15
  - task:
16
  name: Speech Recognition
17
  type: automatic-speech-recognition
18
  dataset:
19
+ name: ALFFA sw
20
+ args: sw
 
21
  metrics:
22
  - name: Test WER
23
  type: wer
24
+ value: WIP
25
  ---
26
 
27
+ # Wav2Vec2-Large-XLSR-53-{Swahili}
28
 
29
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Swahili using the [ALFFA](http://www.openslr.org/25/), ... and ... dataset{s}.
30
  When using this model, make sure that your speech input is sampled at 16kHz.
31
 
32
  ## Usage
39
  from datasets import load_dataset
40
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
41
 
 
42
 
43
+ processor = Wav2Vec2Processor.from_pretrained("alokmatta/wav2vec2-large-xlsr-53-sw")
 
44
 
45
+ model = Wav2Vec2ForCTC.from_pretrained("alokmatta/wav2vec2-large-xlsr-53-sw").to("cuda")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
48
 
49
+ resampler = torchaudio.transforms.Resample(orig_freq=48_000, new_freq=16_000)
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
+ def load_file_to_data(file):
52
+ batch = {}
53
+ speech, _ = torchaudio.load(file)
54
+ batch["speech"] = resampler.forward(speech.squeeze(0)).numpy()
55
+ batch["sampling_rate"] = resampler.new_freq
56
+ return batch
57
 
 
 
 
58
 
59
+ def predict(data):
60
+ features = processor(data["speech"], sampling_rate=data["sampling_rate"], padding=True, return_tensors="pt")
61
+ input_values = features.input_values.to("cuda")
62
+ attention_mask = features.attention_mask.to("cuda")
63
+ with torch.no_grad():
64
+ logits = model(input_values, attention_mask=attention_mask).logits
65
+ pred_ids = torch.argmax(logits, dim=-1)
66
+ return processor.batch_decode(pred_ids)
67
 
68
+ predict(load_file_to_data('./demo.wav'))
69
  ```
70
 
71
+ **Test Result**: WIP %
72
 
73
 
74
  ## Training
75
 
 
76
 
77
+ The script used for training can be found Here- Coming Soon!
config.json ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/wav2vec2-large-xlsr-53",
3
+ "activation_dropout": 0.0,
4
+ "apply_spec_augment": true,
5
+ "architectures": [
6
+ "Wav2Vec2ForCTC"
7
+ ],
8
+ "attention_dropout": 0.1,
9
+ "bos_token_id": 1,
10
+ "conv_bias": true,
11
+ "conv_dim": [
12
+ 512,
13
+ 512,
14
+ 512,
15
+ 512,
16
+ 512,
17
+ 512,
18
+ 512
19
+ ],
20
+ "conv_kernel": [
21
+ 10,
22
+ 3,
23
+ 3,
24
+ 3,
25
+ 3,
26
+ 2,
27
+ 2
28
+ ],
29
+ "conv_stride": [
30
+ 5,
31
+ 2,
32
+ 2,
33
+ 2,
34
+ 2,
35
+ 2,
36
+ 2
37
+ ],
38
+ "ctc_loss_reduction": "mean",
39
+ "ctc_zero_infinity": false,
40
+ "do_stable_layer_norm": true,
41
+ "eos_token_id": 2,
42
+ "feat_extract_activation": "gelu",
43
+ "feat_extract_dropout": 0.0,
44
+ "feat_extract_norm": "layer",
45
+ "feat_proj_dropout": 0.0,
46
+ "final_dropout": 0.0,
47
+ "gradient_checkpointing": true,
48
+ "hidden_act": "gelu",
49
+ "hidden_dropout": 0.1,
50
+ "hidden_size": 1024,
51
+ "initializer_range": 0.02,
52
+ "intermediate_size": 4096,
53
+ "layer_norm_eps": 1e-05,
54
+ "layerdrop": 0.1,
55
+ "mask_channel_length": 10,
56
+ "mask_channel_min_space": 1,
57
+ "mask_channel_other": 0.0,
58
+ "mask_channel_prob": 0.0,
59
+ "mask_channel_selection": "static",
60
+ "mask_feature_length": 10,
61
+ "mask_feature_prob": 0.0,
62
+ "mask_time_length": 10,
63
+ "mask_time_min_space": 1,
64
+ "mask_time_other": 0.0,
65
+ "mask_time_prob": 0.05,
66
+ "mask_time_selection": "static",
67
+ "model_type": "wav2vec2",
68
+ "num_attention_heads": 16,
69
+ "num_conv_pos_embedding_groups": 16,
70
+ "num_conv_pos_embeddings": 128,
71
+ "num_feat_extract_layers": 7,
72
+ "num_hidden_layers": 24,
73
+ "pad_token_id": 40,
74
+ "transformers_version": "4.5.0.dev0",
75
+ "vocab_size": 41
76
+ }
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce45012be135497e4b16ab8654d1a24b97bb14be98b5fabf07fdcff635dcf3e0
3
+ size 1711
preprocessor_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "feature_size": 1,
4
+ "padding_side": "right",
5
+ "padding_value": 0.0,
6
+ "return_attention_mask": true,
7
+ "sampling_rate": 16000
8
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a7772a1884e576f8ce8b03059b788e2f7a734edad5a45f3676945b1b37aba5f
3
+ size 1262101912
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42d008c9c215de3fb87e964d070febd87668726621c0db21bca9ed9eda04b74d
3
+ size 623
trainer_state.json ADDED
@@ -0,0 +1,316 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 1.0,
3
+ "best_model_checkpoint": "../wav2vec2-large-xlsr-53-sw/checkpoint-154",
4
+ "epoch": 2.9967637540453076,
5
+ "global_step": 462,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.06,
12
+ "learning_rate": 0.00015,
13
+ "loss": Infinity,
14
+ "step": 10
15
+ },
16
+ {
17
+ "epoch": 0.13,
18
+ "learning_rate": 0.0003,
19
+ "loss": NaN,
20
+ "step": 20
21
+ },
22
+ {
23
+ "epoch": 0.19,
24
+ "learning_rate": 0.000296,
25
+ "loss": NaN,
26
+ "step": 30
27
+ },
28
+ {
29
+ "epoch": 0.26,
30
+ "learning_rate": 0.000292,
31
+ "loss": NaN,
32
+ "step": 40
33
+ },
34
+ {
35
+ "epoch": 0.32,
36
+ "learning_rate": 0.00028799999999999995,
37
+ "loss": NaN,
38
+ "step": 50
39
+ },
40
+ {
41
+ "epoch": 0.39,
42
+ "learning_rate": 0.00028399999999999996,
43
+ "loss": NaN,
44
+ "step": 60
45
+ },
46
+ {
47
+ "epoch": 0.45,
48
+ "learning_rate": 0.00028,
49
+ "loss": NaN,
50
+ "step": 70
51
+ },
52
+ {
53
+ "epoch": 0.52,
54
+ "learning_rate": 0.000276,
55
+ "loss": NaN,
56
+ "step": 80
57
+ },
58
+ {
59
+ "epoch": 0.58,
60
+ "learning_rate": 0.00027199999999999994,
61
+ "loss": NaN,
62
+ "step": 90
63
+ },
64
+ {
65
+ "epoch": 0.65,
66
+ "learning_rate": 0.00026799999999999995,
67
+ "loss": NaN,
68
+ "step": 100
69
+ },
70
+ {
71
+ "epoch": 0.71,
72
+ "learning_rate": 0.00026399999999999997,
73
+ "loss": NaN,
74
+ "step": 110
75
+ },
76
+ {
77
+ "epoch": 0.78,
78
+ "learning_rate": 0.00026,
79
+ "loss": NaN,
80
+ "step": 120
81
+ },
82
+ {
83
+ "epoch": 0.84,
84
+ "learning_rate": 0.000256,
85
+ "loss": NaN,
86
+ "step": 130
87
+ },
88
+ {
89
+ "epoch": 0.91,
90
+ "learning_rate": 0.00025199999999999995,
91
+ "loss": NaN,
92
+ "step": 140
93
+ },
94
+ {
95
+ "epoch": 0.97,
96
+ "learning_rate": 0.00024799999999999996,
97
+ "loss": NaN,
98
+ "step": 150
99
+ },
100
+ {
101
+ "epoch": 1.0,
102
+ "eval_loss": Infinity,
103
+ "eval_runtime": 358.7961,
104
+ "eval_samples_per_second": 5.549,
105
+ "eval_wer": 1.0,
106
+ "step": 154
107
+ },
108
+ {
109
+ "epoch": 1.04,
110
+ "learning_rate": 0.000244,
111
+ "loss": NaN,
112
+ "step": 160
113
+ },
114
+ {
115
+ "epoch": 1.1,
116
+ "learning_rate": 0.00023999999999999998,
117
+ "loss": NaN,
118
+ "step": 170
119
+ },
120
+ {
121
+ "epoch": 1.17,
122
+ "learning_rate": 0.00023599999999999996,
123
+ "loss": NaN,
124
+ "step": 180
125
+ },
126
+ {
127
+ "epoch": 1.23,
128
+ "learning_rate": 0.00023199999999999997,
129
+ "loss": NaN,
130
+ "step": 190
131
+ },
132
+ {
133
+ "epoch": 1.3,
134
+ "learning_rate": 0.00022799999999999999,
135
+ "loss": NaN,
136
+ "step": 200
137
+ },
138
+ {
139
+ "epoch": 1.36,
140
+ "learning_rate": 0.000224,
141
+ "loss": NaN,
142
+ "step": 210
143
+ },
144
+ {
145
+ "epoch": 1.43,
146
+ "learning_rate": 0.00021999999999999995,
147
+ "loss": NaN,
148
+ "step": 220
149
+ },
150
+ {
151
+ "epoch": 1.49,
152
+ "learning_rate": 0.00021599999999999996,
153
+ "loss": NaN,
154
+ "step": 230
155
+ },
156
+ {
157
+ "epoch": 1.56,
158
+ "learning_rate": 0.00021199999999999998,
159
+ "loss": NaN,
160
+ "step": 240
161
+ },
162
+ {
163
+ "epoch": 1.62,
164
+ "learning_rate": 0.000208,
165
+ "loss": NaN,
166
+ "step": 250
167
+ },
168
+ {
169
+ "epoch": 1.69,
170
+ "learning_rate": 0.000204,
171
+ "loss": NaN,
172
+ "step": 260
173
+ },
174
+ {
175
+ "epoch": 1.75,
176
+ "learning_rate": 0.00019999999999999998,
177
+ "loss": NaN,
178
+ "step": 270
179
+ },
180
+ {
181
+ "epoch": 1.82,
182
+ "learning_rate": 0.00019599999999999997,
183
+ "loss": NaN,
184
+ "step": 280
185
+ },
186
+ {
187
+ "epoch": 1.88,
188
+ "learning_rate": 0.00019199999999999998,
189
+ "loss": NaN,
190
+ "step": 290
191
+ },
192
+ {
193
+ "epoch": 1.94,
194
+ "learning_rate": 0.000188,
195
+ "loss": NaN,
196
+ "step": 300
197
+ },
198
+ {
199
+ "epoch": 2.0,
200
+ "eval_loss": Infinity,
201
+ "eval_runtime": 374.8395,
202
+ "eval_samples_per_second": 5.312,
203
+ "eval_wer": 1.0,
204
+ "step": 308
205
+ },
206
+ {
207
+ "epoch": 2.01,
208
+ "learning_rate": 0.00018399999999999997,
209
+ "loss": NaN,
210
+ "step": 310
211
+ },
212
+ {
213
+ "epoch": 2.08,
214
+ "learning_rate": 0.00017999999999999998,
215
+ "loss": NaN,
216
+ "step": 320
217
+ },
218
+ {
219
+ "epoch": 2.14,
220
+ "learning_rate": 0.000176,
221
+ "loss": NaN,
222
+ "step": 330
223
+ },
224
+ {
225
+ "epoch": 2.21,
226
+ "learning_rate": 0.000172,
227
+ "loss": NaN,
228
+ "step": 340
229
+ },
230
+ {
231
+ "epoch": 2.27,
232
+ "learning_rate": 0.000168,
233
+ "loss": NaN,
234
+ "step": 350
235
+ },
236
+ {
237
+ "epoch": 2.34,
238
+ "learning_rate": 0.00016399999999999997,
239
+ "loss": NaN,
240
+ "step": 360
241
+ },
242
+ {
243
+ "epoch": 2.4,
244
+ "learning_rate": 0.00015999999999999999,
245
+ "loss": NaN,
246
+ "step": 370
247
+ },
248
+ {
249
+ "epoch": 2.47,
250
+ "learning_rate": 0.000156,
251
+ "loss": NaN,
252
+ "step": 380
253
+ },
254
+ {
255
+ "epoch": 2.53,
256
+ "learning_rate": 0.000152,
257
+ "loss": NaN,
258
+ "step": 390
259
+ },
260
+ {
261
+ "epoch": 2.6,
262
+ "learning_rate": 0.000148,
263
+ "loss": NaN,
264
+ "step": 400
265
+ },
266
+ {
267
+ "epoch": 2.66,
268
+ "learning_rate": 0.00014399999999999998,
269
+ "loss": NaN,
270
+ "step": 410
271
+ },
272
+ {
273
+ "epoch": 2.72,
274
+ "learning_rate": 0.00014,
275
+ "loss": NaN,
276
+ "step": 420
277
+ },
278
+ {
279
+ "epoch": 2.79,
280
+ "learning_rate": 0.00013599999999999997,
281
+ "loss": NaN,
282
+ "step": 430
283
+ },
284
+ {
285
+ "epoch": 2.85,
286
+ "learning_rate": 0.00013199999999999998,
287
+ "loss": NaN,
288
+ "step": 440
289
+ },
290
+ {
291
+ "epoch": 2.92,
292
+ "learning_rate": 0.000128,
293
+ "loss": NaN,
294
+ "step": 450
295
+ },
296
+ {
297
+ "epoch": 2.98,
298
+ "learning_rate": 0.00012399999999999998,
299
+ "loss": NaN,
300
+ "step": 460
301
+ },
302
+ {
303
+ "epoch": 3.0,
304
+ "eval_loss": Infinity,
305
+ "eval_runtime": 334.0794,
306
+ "eval_samples_per_second": 5.96,
307
+ "eval_wer": 1.0,
308
+ "step": 462
309
+ }
310
+ ],
311
+ "max_steps": 770,
312
+ "num_train_epochs": 5,
313
+ "total_flos": 1.7250623020466376e+18,
314
+ "trial_name": null,
315
+ "trial_params": null
316
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5fcccbfe1b211984cf07a9455560cac8fbc9c69a011acf7c165d7e5331248598
3
+ size 2351