ProgramadorArtificial commited on
Commit
0d330da
1 Parent(s): c50e76b

Add checkpoint

Browse files
.gitattributes CHANGED
@@ -32,3 +32,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
32
  *.zip filter=lfs diff=lfs merge=lfs -text
33
  *.zst filter=lfs diff=lfs merge=lfs -text
34
  *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ *.tsv filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,82 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - mozilla-foundation/common_voice_13_0
5
+ - falabrasil/Audio_Corpora-Bases_de_áudio
6
+ - lucasgris/wav2vec4bp
7
+ - Edresson/TTS-Portuguese-Corpus
8
+ - voxforge/voxforge-pt-dataset
9
+ - programadorartificial/custom_dataset
10
+ language:
11
+ - pt
12
+ metrics:
13
+ - wer
14
+ - cer
15
+ pipeline_tag: automatic-speech-recognition
16
+ tags:
17
+ - Wav2Vec2
18
+ - speech-to-text
19
  ---
20
+
21
+ # Wav2Vec 2.0 - Brazilian Portuguese
22
+
23
+ This model is a fine-tuned of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53-portuguese) model with the following datasets:
24
+
25
+ - [Common Voice 13.0](https://commonvoice.mozilla.org/pt/datasets)
26
+ - [FalaBrasil](https://github.com/falabrasil/gitlab-resources)
27
+ - [Multilingual Librispeech (MLS) Portuguese](http://www.openslr.org/94/)
28
+ - [TTS-Portuguese-Corpus](https://github.com/Edresson/TTS-Portuguese-Corpus)
29
+ - [VoxForge](https://www.voxforge.org/pt/Downloads)
30
+ - Custom_dataset - With me talking - Not available
31
+
32
+ All datasets were pre-processed and cleaned (trying to keep only brazilian speakers), not original training, testing and validating files wer not used. The files used to training, testing and validating are in the "dataset_files" folder.
33
+
34
+ The model was fine-tuned using the [ProgramadorArificial/transformers](https://github.com/ProgramadorArtificial/transformers/tree/main) repository, more specific, the [speech-recognition](https://github.com/ProgramadorArtificial/transformers/tree/main/examples/pytorch/speech-recognition) folder.
35
+
36
+ ```python
37
+ from transformers import AutoModelForCTC, Wav2Vec2Processor
38
+
39
+ processor = Wav2Vec2Processor.from_pretrained('ProgramadorArtificial/wav2vec2-large-xlsr-53-portuguese')
40
+ model = AutoModelForCTC.from_pretrained('ProgramadorArtificial/wav2vec2-large-xlsr-53-portuguese')
41
+ ```
42
+
43
+ ## Results test and validation dataset
44
+
45
+ ### Test
46
+ | WER | CER |
47
+ |-------|------|
48
+ | 11.7% | 3.3% |
49
+
50
+ | Prediction | Real |
51
+ | ------------ |---------------------------------------------------------------------------------|
52
+ | ele é considerado por seus companheiros de tropa como um oficial moderado | ele é considerado por seus companheiros de tropa como um oficial moderado |
53
+ | os empréstimos do banco mundial exigem contrapartidas do governo beneficiados | os empréstimos do banco mundial exigem contrapartidas dos governos beneficiados |
54
+ | mwendel queiroz rodrigues | wendell queiroz rodrgues |
55
+ | virmontes | virmond |
56
+ | conversões pelo dólar turismo a mil seiscentos e oitenta reais | conversões pelo dólar turismo a mil seiscentos e oitenta reais |
57
+ | o grupo de moda são paulo promove o seu primeiro encontro | o grupo de moda são paulo promove o seu primeiro encontro |
58
+ | abandonou a frança e se fixou em são paulo | abandonou a frança e se fixou em são paulo |
59
+ | o avanço da tecnologia fez esta divisão perder o sentido | o avanço da tecnologia fez esta divisão perder o sentido |
60
+ | reservadamente confessa não entender o comportamento do presidente | reservadamente confessa não entender o comportamento do presidente |
61
+ | foi definido o campeonato estadual que começa no dia trinta | foi definido o campeonato estadual que começa no dia trinta |
62
+
63
+ ### Validation
64
+ | WER | CER |
65
+ |------|------|
66
+ | 9.5% | 2.6% |
67
+
68
+ | Prediction | Real |
69
+ | ------------- | ------------- |
70
+ | pontex gestal | pontes gestal |
71
+ | o cruzeiro real continua valendo até o dia quinze de julho | o cruzeiro real continua valendo até o dia quinze de julho |
72
+ | o espaço fica portanto vago e disponível para o traficante | o espaço fica portanto vago e disponível para o traficante |
73
+ | os botões estão empilhados | os botões estão empilhados |
74
+ | as chances que apareciam eram perdidas pelos atacantes cruzeirenses | as chances que apareciam eram perdidas pelos atacantes cruzeirenses |
75
+ | possibilitar que a população se sinta identificada com o estado | possibilitar que a população se sinta identificada com o estado |
76
+ | os detentos serão transferidos das delegacias que estiverem lotadas | os detentos serão transferidos das delegacias que estiverem lotadas |
77
+ | a euforia pela pista menos quente causou três incidentes | a euforia pela pista menos quente causou três incidentes |
78
+ | almeida é presidente da liga independente das escolas de samba | almeida é presidente da liga independente das escolas de samba |
79
+ | os modos de seu pensamento as cendências de seu espírito e até as menores particularidades de sua vida é nessa fonte que deve beber o poeta brasileiro é dela que há de sair o verdadeiro poema nacional tal como eu o imagino | os modos de seu pensamento as tendências de seu espírito e até as menores particularidades de sua vida é nessa fonte que deve beber o poeta brasileiro é dela que há de sair o verdadeiro poema nacional tal como eu o imagino |
80
+
81
+ ## Autor
82
+ * **Programador Artificial** - [GitHub](https://github.com/ProgramadorArtificial) - [YouTube](https://www.youtube.com/@ProgramadorArtificial)
added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "</s>": 45,
3
+ "<s>": 44
4
+ }
config.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "custom/checkpoint-104000",
3
+ "activation_dropout": 0.0,
4
+ "adapter_kernel_size": 3,
5
+ "adapter_stride": 2,
6
+ "add_adapter": false,
7
+ "apply_spec_augment": true,
8
+ "architectures": [
9
+ "Wav2Vec2ForCTC"
10
+ ],
11
+ "attention_dropout": 0.0,
12
+ "bos_token_id": 1,
13
+ "classifier_proj_size": 256,
14
+ "codevector_dim": 256,
15
+ "contrastive_logits_temperature": 0.1,
16
+ "conv_bias": true,
17
+ "conv_dim": [
18
+ 512,
19
+ 512,
20
+ 512,
21
+ 512,
22
+ 512,
23
+ 512,
24
+ 512
25
+ ],
26
+ "conv_kernel": [
27
+ 10,
28
+ 3,
29
+ 3,
30
+ 3,
31
+ 3,
32
+ 2,
33
+ 2
34
+ ],
35
+ "conv_stride": [
36
+ 5,
37
+ 2,
38
+ 2,
39
+ 2,
40
+ 2,
41
+ 2,
42
+ 2
43
+ ],
44
+ "ctc_loss_reduction": "mean",
45
+ "ctc_zero_infinity": false,
46
+ "diversity_loss_weight": 0.1,
47
+ "do_stable_layer_norm": true,
48
+ "eos_token_id": 2,
49
+ "feat_extract_activation": "gelu",
50
+ "feat_extract_dropout": 0.0,
51
+ "feat_extract_norm": "layer",
52
+ "feat_proj_dropout": 0.0,
53
+ "feat_quantizer_dropout": 0.0,
54
+ "final_dropout": 0.0,
55
+ "hidden_act": "gelu",
56
+ "hidden_dropout": 0.0,
57
+ "hidden_dropout_prob": 0.1,
58
+ "hidden_size": 1024,
59
+ "initializer_range": 0.02,
60
+ "intermediate_size": 4096,
61
+ "layer_norm_eps": 1e-05,
62
+ "layerdrop": 0.0,
63
+ "mask_feature_length": 10,
64
+ "mask_feature_min_masks": 0,
65
+ "mask_feature_prob": 0.0,
66
+ "mask_time_length": 10,
67
+ "mask_time_min_masks": 2,
68
+ "mask_time_prob": 0.05,
69
+ "model_type": "wav2vec2",
70
+ "num_adapter_layers": 3,
71
+ "num_attention_heads": 16,
72
+ "num_codevector_groups": 2,
73
+ "num_codevectors_per_group": 320,
74
+ "num_conv_pos_embedding_groups": 16,
75
+ "num_conv_pos_embeddings": 128,
76
+ "num_feat_extract_layers": 7,
77
+ "num_hidden_layers": 24,
78
+ "num_negatives": 100,
79
+ "output_hidden_size": 1024,
80
+ "pad_token_id": 43,
81
+ "proj_codevector_dim": 256,
82
+ "tdnn_dilation": [
83
+ 1,
84
+ 2,
85
+ 3,
86
+ 1,
87
+ 1
88
+ ],
89
+ "tdnn_dim": [
90
+ 512,
91
+ 512,
92
+ 512,
93
+ 512,
94
+ 1500
95
+ ],
96
+ "tdnn_kernel": [
97
+ 5,
98
+ 3,
99
+ 3,
100
+ 1,
101
+ 1
102
+ ],
103
+ "torch_dtype": "float32",
104
+ "transformers_version": "4.30.0.dev0",
105
+ "use_weighted_layer_sum": false,
106
+ "vocab_size": 46,
107
+ "xvector_output_dim": 512
108
+ }
dataset_files/dataset.tsv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8ced2405fc049e21b29a24484b6317a63d55e3709490ab9ea1df4145c9ff578b
3
+ size 33440132
dataset_files/dataset_cleaned.tsv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:828dbc62ffc0e32fd09aa7926ea4d4eb805c4f438d5729ad7b08fa0cf03bc6b8
3
+ size 31300268
dataset_files/test.tsv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca42464ec3e3bcd7bf51cbe6b8fe834c77ab5a9236b1af51e55dfb1cff01b563
3
+ size 445
dataset_files/train.tsv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a0a2427b6fb32a3bf7a2c74e7cf98951e86d0f2b5b9b0e0540d9d74eec084dd5
3
+ size 1285
dataset_files/validation.tsv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bf6cbed6c4e9947339bfb9bebc640b2c0b0180625447a9f678ffb185bd235bc
3
+ size 2188
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4aa636f620c8dd46721b3a99b6561b4ada26b1c64d2e89bab17ce5fcdf959775
3
+ size 2490436753
preprocessor_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "feature_extractor_type": "Wav2Vec2FeatureExtractor",
4
+ "feature_size": 1,
5
+ "padding_side": "right",
6
+ "padding_value": 0.0,
7
+ "processor_class": "Wav2Vec2Processor",
8
+ "return_attention_mask": true,
9
+ "sampling_rate": 16000
10
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3d9fecee01d7d2b5050639bd58353751e33104c619793074b1ed8453cb2ca43
3
+ size 1262087281
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:534e88bd04308ed9aacb122000aa304ac595e155cb0d03c71ba95a72f5769169
3
+ size 14567
scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4010334f7673bc3387cdfd0405ab9d54c121c6f2e12704099272671a8cda3e8
3
+ size 559
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b630229639be0b098ad69aff739d3f984087b2119369f3c006bc079ffec83cd1
3
+ size 623
special_tokens_map.json ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ {
11
+ "content": "</s>",
12
+ "lstrip": false,
13
+ "normalized": true,
14
+ "rstrip": false,
15
+ "single_word": false
16
+ },
17
+ {
18
+ "content": "<s>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ {
25
+ "content": "</s>",
26
+ "lstrip": false,
27
+ "normalized": true,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ },
31
+ {
32
+ "content": "<s>",
33
+ "lstrip": false,
34
+ "normalized": true,
35
+ "rstrip": false,
36
+ "single_word": false
37
+ },
38
+ {
39
+ "content": "</s>",
40
+ "lstrip": false,
41
+ "normalized": true,
42
+ "rstrip": false,
43
+ "single_word": false
44
+ },
45
+ {
46
+ "content": "<s>",
47
+ "lstrip": false,
48
+ "normalized": true,
49
+ "rstrip": false,
50
+ "single_word": false
51
+ },
52
+ {
53
+ "content": "</s>",
54
+ "lstrip": false,
55
+ "normalized": true,
56
+ "rstrip": false,
57
+ "single_word": false
58
+ },
59
+ {
60
+ "content": "<s>",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false
65
+ },
66
+ {
67
+ "content": "</s>",
68
+ "lstrip": false,
69
+ "normalized": true,
70
+ "rstrip": false,
71
+ "single_word": false
72
+ },
73
+ {
74
+ "content": "<s>",
75
+ "lstrip": false,
76
+ "normalized": true,
77
+ "rstrip": false,
78
+ "single_word": false
79
+ },
80
+ {
81
+ "content": "</s>",
82
+ "lstrip": false,
83
+ "normalized": true,
84
+ "rstrip": false,
85
+ "single_word": false
86
+ },
87
+ {
88
+ "content": "<s>",
89
+ "lstrip": false,
90
+ "normalized": true,
91
+ "rstrip": false,
92
+ "single_word": false
93
+ },
94
+ {
95
+ "content": "</s>",
96
+ "lstrip": false,
97
+ "normalized": true,
98
+ "rstrip": false,
99
+ "single_word": false
100
+ },
101
+ {
102
+ "content": "<s>",
103
+ "lstrip": false,
104
+ "normalized": true,
105
+ "rstrip": false,
106
+ "single_word": false
107
+ },
108
+ {
109
+ "content": "</s>",
110
+ "lstrip": false,
111
+ "normalized": true,
112
+ "rstrip": false,
113
+ "single_word": false
114
+ }
115
+ ],
116
+ "bos_token": "<s>",
117
+ "eos_token": "</s>",
118
+ "pad_token": "[PAD]",
119
+ "unk_token": "[UNK]"
120
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "clean_up_tokenization_spaces": true,
4
+ "do_lower_case": false,
5
+ "eos_token": "</s>",
6
+ "model_max_length": 1000000000000000019884624838656,
7
+ "pad_token": "[PAD]",
8
+ "processor_class": "Wav2Vec2Processor",
9
+ "replace_word_delimiter_char": " ",
10
+ "tokenizer_class": "Wav2Vec2CTCTokenizer",
11
+ "unk_token": "[UNK]",
12
+ "word_delimiter_token": "|"
13
+ }
trainer_state.json ADDED
@@ -0,0 +1,2510 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 8.05017317232987,
5
+ "global_step": 172000,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.02,
12
+ "learning_rate": 0.0002958,
13
+ "loss": 3.155,
14
+ "step": 500
15
+ },
16
+ {
17
+ "epoch": 0.05,
18
+ "learning_rate": 0.0002999769241570842,
19
+ "loss": 0.338,
20
+ "step": 1000
21
+ },
22
+ {
23
+ "epoch": 0.07,
24
+ "learning_rate": 0.00029995352066528323,
25
+ "loss": 0.327,
26
+ "step": 1500
27
+ },
28
+ {
29
+ "epoch": 0.09,
30
+ "learning_rate": 0.00029993011717348224,
31
+ "loss": 0.3225,
32
+ "step": 2000
33
+ },
34
+ {
35
+ "epoch": 0.12,
36
+ "learning_rate": 0.00029990671368168126,
37
+ "loss": 0.3079,
38
+ "step": 2500
39
+ },
40
+ {
41
+ "epoch": 0.14,
42
+ "learning_rate": 0.00029988331018988027,
43
+ "loss": 0.3309,
44
+ "step": 3000
45
+ },
46
+ {
47
+ "epoch": 0.16,
48
+ "learning_rate": 0.00029985990669807933,
49
+ "loss": 0.2943,
50
+ "step": 3500
51
+ },
52
+ {
53
+ "epoch": 0.19,
54
+ "learning_rate": 0.00029983650320627834,
55
+ "loss": 0.2795,
56
+ "step": 4000
57
+ },
58
+ {
59
+ "epoch": 0.19,
60
+ "eval_cer": 0.05897733717227239,
61
+ "eval_loss": 0.22565825283527374,
62
+ "eval_runtime": 1455.9056,
63
+ "eval_samples_per_second": 13.045,
64
+ "eval_steps_per_second": 3.262,
65
+ "eval_wer": 0.20478772194834444,
66
+ "step": 4000
67
+ },
68
+ {
69
+ "epoch": 0.21,
70
+ "learning_rate": 0.0002998130997144774,
71
+ "loss": 0.2896,
72
+ "step": 4500
73
+ },
74
+ {
75
+ "epoch": 0.23,
76
+ "learning_rate": 0.0002997896962226764,
77
+ "loss": 0.2735,
78
+ "step": 5000
79
+ },
80
+ {
81
+ "epoch": 0.26,
82
+ "learning_rate": 0.00029976629273087543,
83
+ "loss": 0.281,
84
+ "step": 5500
85
+ },
86
+ {
87
+ "epoch": 0.28,
88
+ "learning_rate": 0.00029974288923907444,
89
+ "loss": 0.2755,
90
+ "step": 6000
91
+ },
92
+ {
93
+ "epoch": 0.3,
94
+ "learning_rate": 0.00029971948574727345,
95
+ "loss": 0.2831,
96
+ "step": 6500
97
+ },
98
+ {
99
+ "epoch": 0.33,
100
+ "learning_rate": 0.0002996961290624561,
101
+ "loss": 0.2708,
102
+ "step": 7000
103
+ },
104
+ {
105
+ "epoch": 0.35,
106
+ "learning_rate": 0.0002996727255706551,
107
+ "loss": 0.2667,
108
+ "step": 7500
109
+ },
110
+ {
111
+ "epoch": 0.37,
112
+ "learning_rate": 0.0002996493220788541,
113
+ "loss": 0.2582,
114
+ "step": 8000
115
+ },
116
+ {
117
+ "epoch": 0.37,
118
+ "eval_cer": 0.05252843759542817,
119
+ "eval_loss": 0.20639470219612122,
120
+ "eval_runtime": 1350.2359,
121
+ "eval_samples_per_second": 14.066,
122
+ "eval_steps_per_second": 3.517,
123
+ "eval_wer": 0.18800766597579208,
124
+ "step": 8000
125
+ },
126
+ {
127
+ "epoch": 0.4,
128
+ "learning_rate": 0.0002996259185870532,
129
+ "loss": 0.2874,
130
+ "step": 8500
131
+ },
132
+ {
133
+ "epoch": 0.42,
134
+ "learning_rate": 0.0002996025150952522,
135
+ "loss": 0.2585,
136
+ "step": 9000
137
+ },
138
+ {
139
+ "epoch": 0.44,
140
+ "learning_rate": 0.0002995791116034512,
141
+ "loss": 0.2579,
142
+ "step": 9500
143
+ },
144
+ {
145
+ "epoch": 0.47,
146
+ "learning_rate": 0.0002995557081116502,
147
+ "loss": 0.2664,
148
+ "step": 10000
149
+ },
150
+ {
151
+ "epoch": 0.49,
152
+ "learning_rate": 0.00029953230461984923,
153
+ "loss": 0.2702,
154
+ "step": 10500
155
+ },
156
+ {
157
+ "epoch": 0.51,
158
+ "learning_rate": 0.00029950890112804824,
159
+ "loss": 0.2487,
160
+ "step": 11000
161
+ },
162
+ {
163
+ "epoch": 0.54,
164
+ "learning_rate": 0.0002994854976362473,
165
+ "loss": 0.2501,
166
+ "step": 11500
167
+ },
168
+ {
169
+ "epoch": 0.56,
170
+ "learning_rate": 0.0002994620941444463,
171
+ "loss": 0.2485,
172
+ "step": 12000
173
+ },
174
+ {
175
+ "epoch": 0.56,
176
+ "eval_cer": 0.04427867207608079,
177
+ "eval_loss": 0.16677920520305634,
178
+ "eval_runtime": 1348.5895,
179
+ "eval_samples_per_second": 14.084,
180
+ "eval_steps_per_second": 3.521,
181
+ "eval_wer": 0.15644211290530335,
182
+ "step": 12000
183
+ },
184
+ {
185
+ "epoch": 0.59,
186
+ "learning_rate": 0.0002994386906526454,
187
+ "loss": 0.2546,
188
+ "step": 12500
189
+ },
190
+ {
191
+ "epoch": 0.61,
192
+ "learning_rate": 0.000299415333967828,
193
+ "loss": 0.2376,
194
+ "step": 13000
195
+ },
196
+ {
197
+ "epoch": 0.63,
198
+ "learning_rate": 0.000299391930476027,
199
+ "loss": 0.2451,
200
+ "step": 13500
201
+ },
202
+ {
203
+ "epoch": 0.66,
204
+ "learning_rate": 0.000299368526984226,
205
+ "loss": 0.2436,
206
+ "step": 14000
207
+ },
208
+ {
209
+ "epoch": 0.68,
210
+ "learning_rate": 0.000299345123492425,
211
+ "loss": 0.2405,
212
+ "step": 14500
213
+ },
214
+ {
215
+ "epoch": 0.7,
216
+ "learning_rate": 0.0002993217200006241,
217
+ "loss": 0.2447,
218
+ "step": 15000
219
+ },
220
+ {
221
+ "epoch": 0.73,
222
+ "learning_rate": 0.0002992983165088231,
223
+ "loss": 0.2415,
224
+ "step": 15500
225
+ },
226
+ {
227
+ "epoch": 0.75,
228
+ "learning_rate": 0.0002992749130170221,
229
+ "loss": 0.2379,
230
+ "step": 16000
231
+ },
232
+ {
233
+ "epoch": 0.75,
234
+ "eval_cer": 0.04885581621079265,
235
+ "eval_loss": 0.1717624068260193,
236
+ "eval_runtime": 1346.3358,
237
+ "eval_samples_per_second": 14.107,
238
+ "eval_steps_per_second": 3.527,
239
+ "eval_wer": 0.17038060554963563,
240
+ "step": 16000
241
+ },
242
+ {
243
+ "epoch": 0.77,
244
+ "learning_rate": 0.00029925150952522117,
245
+ "loss": 0.2462,
246
+ "step": 16500
247
+ },
248
+ {
249
+ "epoch": 0.8,
250
+ "learning_rate": 0.0002992281060334202,
251
+ "loss": 0.2305,
252
+ "step": 17000
253
+ },
254
+ {
255
+ "epoch": 0.82,
256
+ "learning_rate": 0.0002992047025416192,
257
+ "loss": 0.2333,
258
+ "step": 17500
259
+ },
260
+ {
261
+ "epoch": 0.84,
262
+ "learning_rate": 0.0002991812990498182,
263
+ "loss": 0.2243,
264
+ "step": 18000
265
+ },
266
+ {
267
+ "epoch": 0.87,
268
+ "learning_rate": 0.00029915794236500085,
269
+ "loss": 0.2182,
270
+ "step": 18500
271
+ },
272
+ {
273
+ "epoch": 0.89,
274
+ "learning_rate": 0.00029913453887319986,
275
+ "loss": 0.2347,
276
+ "step": 19000
277
+ },
278
+ {
279
+ "epoch": 0.91,
280
+ "learning_rate": 0.00029911113538139887,
281
+ "loss": 0.2178,
282
+ "step": 19500
283
+ },
284
+ {
285
+ "epoch": 0.94,
286
+ "learning_rate": 0.00029908773188959794,
287
+ "loss": 0.2313,
288
+ "step": 20000
289
+ },
290
+ {
291
+ "epoch": 0.94,
292
+ "eval_cer": 0.04552196483880818,
293
+ "eval_loss": 0.166758194565773,
294
+ "eval_runtime": 1346.4361,
295
+ "eval_samples_per_second": 14.106,
296
+ "eval_steps_per_second": 3.527,
297
+ "eval_wer": 0.1609425559237618,
298
+ "step": 20000
299
+ },
300
+ {
301
+ "epoch": 0.96,
302
+ "learning_rate": 0.00029906432839779695,
303
+ "loss": 0.2118,
304
+ "step": 20500
305
+ },
306
+ {
307
+ "epoch": 0.98,
308
+ "learning_rate": 0.00029904092490599596,
309
+ "loss": 0.2437,
310
+ "step": 21000
311
+ },
312
+ {
313
+ "epoch": 1.01,
314
+ "learning_rate": 0.00029901752141419497,
315
+ "loss": 0.2139,
316
+ "step": 21500
317
+ },
318
+ {
319
+ "epoch": 1.03,
320
+ "learning_rate": 0.00029899416472937756,
321
+ "loss": 0.2157,
322
+ "step": 22000
323
+ },
324
+ {
325
+ "epoch": 1.05,
326
+ "learning_rate": 0.00029897076123757663,
327
+ "loss": 0.211,
328
+ "step": 22500
329
+ },
330
+ {
331
+ "epoch": 1.08,
332
+ "learning_rate": 0.00029894735774577564,
333
+ "loss": 0.2121,
334
+ "step": 23000
335
+ },
336
+ {
337
+ "epoch": 1.1,
338
+ "learning_rate": 0.00029892395425397465,
339
+ "loss": 0.2066,
340
+ "step": 23500
341
+ },
342
+ {
343
+ "epoch": 1.12,
344
+ "learning_rate": 0.0002989005507621737,
345
+ "loss": 0.2047,
346
+ "step": 24000
347
+ },
348
+ {
349
+ "epoch": 1.12,
350
+ "eval_cer": 0.04369247044453169,
351
+ "eval_loss": 0.15543465316295624,
352
+ "eval_runtime": 1348.2816,
353
+ "eval_samples_per_second": 14.087,
354
+ "eval_steps_per_second": 3.522,
355
+ "eval_wer": 0.14927965588980355,
356
+ "step": 24000
357
+ },
358
+ {
359
+ "epoch": 1.15,
360
+ "learning_rate": 0.00029887714727037273,
361
+ "loss": 0.2077,
362
+ "step": 24500
363
+ },
364
+ {
365
+ "epoch": 1.17,
366
+ "learning_rate": 0.00029885374377857174,
367
+ "loss": 0.201,
368
+ "step": 25000
369
+ },
370
+ {
371
+ "epoch": 1.19,
372
+ "learning_rate": 0.00029883034028677075,
373
+ "loss": 0.207,
374
+ "step": 25500
375
+ },
376
+ {
377
+ "epoch": 1.22,
378
+ "learning_rate": 0.00029880693679496976,
379
+ "loss": 0.1931,
380
+ "step": 26000
381
+ },
382
+ {
383
+ "epoch": 1.24,
384
+ "learning_rate": 0.0002987835801101524,
385
+ "loss": 0.2115,
386
+ "step": 26500
387
+ },
388
+ {
389
+ "epoch": 1.26,
390
+ "learning_rate": 0.0002987601766183514,
391
+ "loss": 0.2226,
392
+ "step": 27000
393
+ },
394
+ {
395
+ "epoch": 1.29,
396
+ "learning_rate": 0.0002987367731265505,
397
+ "loss": 0.2106,
398
+ "step": 27500
399
+ },
400
+ {
401
+ "epoch": 1.31,
402
+ "learning_rate": 0.0002987133696347495,
403
+ "loss": 0.2055,
404
+ "step": 28000
405
+ },
406
+ {
407
+ "epoch": 1.31,
408
+ "eval_cer": 0.049039174628102776,
409
+ "eval_loss": 0.16633369028568268,
410
+ "eval_runtime": 1346.0932,
411
+ "eval_samples_per_second": 14.11,
412
+ "eval_steps_per_second": 3.528,
413
+ "eval_wer": 0.16584113006787746,
414
+ "step": 28000
415
+ },
416
+ {
417
+ "epoch": 1.33,
418
+ "learning_rate": 0.0002986900129499321,
419
+ "loss": 0.2178,
420
+ "step": 28500
421
+ },
422
+ {
423
+ "epoch": 1.36,
424
+ "learning_rate": 0.0002986666094581311,
425
+ "loss": 0.2125,
426
+ "step": 29000
427
+ },
428
+ {
429
+ "epoch": 1.38,
430
+ "learning_rate": 0.0002986432059663301,
431
+ "loss": 0.2029,
432
+ "step": 29500
433
+ },
434
+ {
435
+ "epoch": 1.4,
436
+ "learning_rate": 0.0002986198024745292,
437
+ "loss": 0.2104,
438
+ "step": 30000
439
+ },
440
+ {
441
+ "epoch": 1.43,
442
+ "learning_rate": 0.0002985963989827282,
443
+ "loss": 0.2034,
444
+ "step": 30500
445
+ },
446
+ {
447
+ "epoch": 1.45,
448
+ "learning_rate": 0.00029857304229791084,
449
+ "loss": 0.2116,
450
+ "step": 31000
451
+ },
452
+ {
453
+ "epoch": 1.47,
454
+ "learning_rate": 0.00029854963880610985,
455
+ "loss": 0.2059,
456
+ "step": 31500
457
+ },
458
+ {
459
+ "epoch": 1.5,
460
+ "learning_rate": 0.00029852623531430886,
461
+ "loss": 0.205,
462
+ "step": 32000
463
+ },
464
+ {
465
+ "epoch": 1.5,
466
+ "eval_cer": 0.04317034201457052,
467
+ "eval_loss": 0.14783667027950287,
468
+ "eval_runtime": 1347.0872,
469
+ "eval_samples_per_second": 14.099,
470
+ "eval_steps_per_second": 3.525,
471
+ "eval_wer": 0.15488081437331427,
472
+ "step": 32000
473
+ },
474
+ {
475
+ "epoch": 1.52,
476
+ "learning_rate": 0.00029850283182250787,
477
+ "loss": 0.2225,
478
+ "step": 32500
479
+ },
480
+ {
481
+ "epoch": 1.54,
482
+ "learning_rate": 0.0002984794751376905,
483
+ "loss": 0.2011,
484
+ "step": 33000
485
+ },
486
+ {
487
+ "epoch": 1.57,
488
+ "learning_rate": 0.00029845607164588953,
489
+ "loss": 0.2011,
490
+ "step": 33500
491
+ },
492
+ {
493
+ "epoch": 1.59,
494
+ "learning_rate": 0.0002984327149610722,
495
+ "loss": 0.1957,
496
+ "step": 34000
497
+ },
498
+ {
499
+ "epoch": 1.61,
500
+ "learning_rate": 0.0002984093114692712,
501
+ "loss": 0.2044,
502
+ "step": 34500
503
+ },
504
+ {
505
+ "epoch": 1.64,
506
+ "learning_rate": 0.00029838590797747025,
507
+ "loss": 0.2011,
508
+ "step": 35000
509
+ },
510
+ {
511
+ "epoch": 1.66,
512
+ "learning_rate": 0.00029836250448566926,
513
+ "loss": 0.2046,
514
+ "step": 35500
515
+ },
516
+ {
517
+ "epoch": 1.68,
518
+ "learning_rate": 0.0002983391009938683,
519
+ "loss": 0.2017,
520
+ "step": 36000
521
+ },
522
+ {
523
+ "epoch": 1.68,
524
+ "eval_cer": 0.04304628539021943,
525
+ "eval_loss": 0.14913983643054962,
526
+ "eval_runtime": 1406.2309,
527
+ "eval_samples_per_second": 13.506,
528
+ "eval_steps_per_second": 3.377,
529
+ "eval_wer": 0.1499119817952591,
530
+ "step": 36000
531
+ },
532
+ {
533
+ "epoch": 1.71,
534
+ "learning_rate": 0.0002983156975020673,
535
+ "loss": 0.2028,
536
+ "step": 36500
537
+ },
538
+ {
539
+ "epoch": 1.73,
540
+ "learning_rate": 0.0002982922940102663,
541
+ "loss": 0.1991,
542
+ "step": 37000
543
+ },
544
+ {
545
+ "epoch": 1.76,
546
+ "learning_rate": 0.0002982688905184653,
547
+ "loss": 0.2145,
548
+ "step": 37500
549
+ },
550
+ {
551
+ "epoch": 1.78,
552
+ "learning_rate": 0.00029824553383364796,
553
+ "loss": 0.2043,
554
+ "step": 38000
555
+ },
556
+ {
557
+ "epoch": 1.8,
558
+ "learning_rate": 0.00029822213034184697,
559
+ "loss": 0.194,
560
+ "step": 38500
561
+ },
562
+ {
563
+ "epoch": 1.83,
564
+ "learning_rate": 0.00029819872685004603,
565
+ "loss": 0.2006,
566
+ "step": 39000
567
+ },
568
+ {
569
+ "epoch": 1.85,
570
+ "learning_rate": 0.00029817532335824504,
571
+ "loss": 0.2095,
572
+ "step": 39500
573
+ },
574
+ {
575
+ "epoch": 1.87,
576
+ "learning_rate": 0.00029815196667342764,
577
+ "loss": 0.1975,
578
+ "step": 40000
579
+ },
580
+ {
581
+ "epoch": 1.87,
582
+ "eval_cer": 0.0433755125856127,
583
+ "eval_loss": 0.14519542455673218,
584
+ "eval_runtime": 1370.9335,
585
+ "eval_samples_per_second": 13.854,
586
+ "eval_steps_per_second": 3.464,
587
+ "eval_wer": 0.14985343310030952,
588
+ "step": 40000
589
+ },
590
+ {
591
+ "epoch": 1.9,
592
+ "learning_rate": 0.00029812856318162665,
593
+ "loss": 0.2033,
594
+ "step": 40500
595
+ },
596
+ {
597
+ "epoch": 1.92,
598
+ "learning_rate": 0.0002981051596898257,
599
+ "loss": 0.1975,
600
+ "step": 41000
601
+ },
602
+ {
603
+ "epoch": 1.94,
604
+ "learning_rate": 0.0002980817561980247,
605
+ "loss": 0.1945,
606
+ "step": 41500
607
+ },
608
+ {
609
+ "epoch": 1.97,
610
+ "learning_rate": 0.00029805835270622374,
611
+ "loss": 0.1939,
612
+ "step": 42000
613
+ },
614
+ {
615
+ "epoch": 1.99,
616
+ "learning_rate": 0.0002980349492144228,
617
+ "loss": 0.2073,
618
+ "step": 42500
619
+ },
620
+ {
621
+ "epoch": 2.01,
622
+ "learning_rate": 0.0002980115925296054,
623
+ "loss": 0.193,
624
+ "step": 43000
625
+ },
626
+ {
627
+ "epoch": 2.04,
628
+ "learning_rate": 0.0002979881890378044,
629
+ "loss": 0.1895,
630
+ "step": 43500
631
+ },
632
+ {
633
+ "epoch": 2.06,
634
+ "learning_rate": 0.002979647855460034,
635
+ "loss": 0.1882,
636
+ "step": 44000
637
+ },
638
+ {
639
+ "epoch": 2.06,
640
+ "eval_cer": 0.04118611765475723,
641
+ "eval_loss": 0.1441776603460312,
642
+ "eval_runtime": 1437.09,
643
+ "eval_samples_per_second": 13.216,
644
+ "eval_steps_per_second": 3.305,
645
+ "eval_wer": 0.14381901427417182,
646
+ "step": 44000
647
+ },
648
+ {
649
+ "epoch": 2.08,
650
+ "learning_rate": 0.00029794138205420243,
651
+ "loss": 0.1814,
652
+ "step": 44500
653
+ },
654
+ {
655
+ "epoch": 2.11,
656
+ "learning_rate": 0.0002979179785624015,
657
+ "loss": 0.1857,
658
+ "step": 45000
659
+ },
660
+ {
661
+ "epoch": 2.13,
662
+ "learning_rate": 0.0002978945750706005,
663
+ "loss": 0.193,
664
+ "step": 45500
665
+ },
666
+ {
667
+ "epoch": 2.15,
668
+ "learning_rate": 0.0002978711715787995,
669
+ "loss": 0.1835,
670
+ "step": 46000
671
+ },
672
+ {
673
+ "epoch": 2.18,
674
+ "learning_rate": 0.0002978477680869986,
675
+ "loss": 0.171,
676
+ "step": 46500
677
+ },
678
+ {
679
+ "epoch": 2.2,
680
+ "learning_rate": 0.0002978243645951976,
681
+ "loss": 0.1777,
682
+ "step": 47000
683
+ },
684
+ {
685
+ "epoch": 2.22,
686
+ "learning_rate": 0.0002978010079103802,
687
+ "loss": 0.1701,
688
+ "step": 47500
689
+ },
690
+ {
691
+ "epoch": 2.25,
692
+ "learning_rate": 0.0002977776044185792,
693
+ "loss": 0.1821,
694
+ "step": 48000
695
+ },
696
+ {
697
+ "epoch": 2.25,
698
+ "eval_cer": 0.04377358439122279,
699
+ "eval_loss": 0.14771120250225067,
700
+ "eval_runtime": 1168.4306,
701
+ "eval_samples_per_second": 16.255,
702
+ "eval_steps_per_second": 4.064,
703
+ "eval_wer": 0.1521368322033435,
704
+ "step": 48000
705
+ },
706
+ {
707
+ "epoch": 2.27,
708
+ "learning_rate": 0.00029775420092677826,
709
+ "loss": 0.1878,
710
+ "step": 48500
711
+ },
712
+ {
713
+ "epoch": 2.29,
714
+ "learning_rate": 0.0002977307974349773,
715
+ "loss": 0.1848,
716
+ "step": 49000
717
+ },
718
+ {
719
+ "epoch": 2.32,
720
+ "learning_rate": 0.0002977073939431763,
721
+ "loss": 0.1837,
722
+ "step": 49500
723
+ },
724
+ {
725
+ "epoch": 2.34,
726
+ "learning_rate": 0.00029768403725835893,
727
+ "loss": 0.1736,
728
+ "step": 50000
729
+ },
730
+ {
731
+ "epoch": 2.36,
732
+ "learning_rate": 0.00029766063376655794,
733
+ "loss": 0.1805,
734
+ "step": 50500
735
+ },
736
+ {
737
+ "epoch": 2.39,
738
+ "learning_rate": 0.00029763723027475696,
739
+ "loss": 0.1795,
740
+ "step": 51000
741
+ },
742
+ {
743
+ "epoch": 2.41,
744
+ "learning_rate": 0.00029761382678295597,
745
+ "loss": 0.1752,
746
+ "step": 51500
747
+ },
748
+ {
749
+ "epoch": 2.43,
750
+ "learning_rate": 0.000297590423291155,
751
+ "loss": 0.1848,
752
+ "step": 52000
753
+ },
754
+ {
755
+ "epoch": 2.43,
756
+ "eval_cer": 0.04313966867338481,
757
+ "eval_loss": 0.13986873626708984,
758
+ "eval_runtime": 1079.4763,
759
+ "eval_samples_per_second": 17.595,
760
+ "eval_steps_per_second": 4.399,
761
+ "eval_wer": 0.14816332743943136,
762
+ "step": 52000
763
+ },
764
+ {
765
+ "epoch": 2.46,
766
+ "learning_rate": 0.0002975670666063376,
767
+ "loss": 0.1873,
768
+ "step": 52500
769
+ },
770
+ {
771
+ "epoch": 2.48,
772
+ "learning_rate": 0.0002975436631145367,
773
+ "loss": 0.1848,
774
+ "step": 53000
775
+ },
776
+ {
777
+ "epoch": 2.5,
778
+ "learning_rate": 0.0002975202596227357,
779
+ "loss": 0.1953,
780
+ "step": 53500
781
+ },
782
+ {
783
+ "epoch": 2.53,
784
+ "learning_rate": 0.0002974968561309347,
785
+ "loss": 0.1779,
786
+ "step": 54000
787
+ },
788
+ {
789
+ "epoch": 2.55,
790
+ "learning_rate": 0.0002974734526391337,
791
+ "loss": 0.1861,
792
+ "step": 54500
793
+ },
794
+ {
795
+ "epoch": 2.57,
796
+ "learning_rate": 0.00029745004914733274,
797
+ "loss": 0.1784,
798
+ "step": 55000
799
+ },
800
+ {
801
+ "epoch": 2.6,
802
+ "learning_rate": 0.0002974266924625154,
803
+ "loss": 0.1902,
804
+ "step": 55500
805
+ },
806
+ {
807
+ "epoch": 2.62,
808
+ "learning_rate": 0.0002974032889707144,
809
+ "loss": 0.1852,
810
+ "step": 56000
811
+ },
812
+ {
813
+ "epoch": 2.62,
814
+ "eval_cer": 0.041413782009335605,
815
+ "eval_loss": 0.14103934168815613,
816
+ "eval_runtime": 1043.4328,
817
+ "eval_samples_per_second": 18.202,
818
+ "eval_steps_per_second": 4.551,
819
+ "eval_wer": 0.1456340238176091,
820
+ "step": 56000
821
+ },
822
+ {
823
+ "epoch": 2.64,
824
+ "learning_rate": 0.0002973798854789134,
825
+ "loss": 0.1758,
826
+ "step": 56500
827
+ },
828
+ {
829
+ "epoch": 2.67,
830
+ "learning_rate": 0.00029735648198711247,
831
+ "loss": 0.1799,
832
+ "step": 57000
833
+ },
834
+ {
835
+ "epoch": 2.69,
836
+ "learning_rate": 0.00029733312530229506,
837
+ "loss": 0.177,
838
+ "step": 57500
839
+ },
840
+ {
841
+ "epoch": 2.71,
842
+ "learning_rate": 0.00029730972181049413,
843
+ "loss": 0.176,
844
+ "step": 58000
845
+ },
846
+ {
847
+ "epoch": 2.74,
848
+ "learning_rate": 0.00029728631831869314,
849
+ "loss": 0.1852,
850
+ "step": 58500
851
+ },
852
+ {
853
+ "epoch": 2.76,
854
+ "learning_rate": 0.00029726296163387573,
855
+ "loss": 0.1813,
856
+ "step": 59000
857
+ },
858
+ {
859
+ "epoch": 2.78,
860
+ "learning_rate": 0.00029723955814207474,
861
+ "loss": 0.1879,
862
+ "step": 59500
863
+ },
864
+ {
865
+ "epoch": 2.81,
866
+ "learning_rate": 0.0002972161546502738,
867
+ "loss": 0.173,
868
+ "step": 60000
869
+ },
870
+ {
871
+ "epoch": 2.81,
872
+ "eval_cer": 0.039124869127077605,
873
+ "eval_loss": 0.13692142069339752,
874
+ "eval_runtime": 1043.8955,
875
+ "eval_samples_per_second": 18.194,
876
+ "eval_steps_per_second": 4.549,
877
+ "eval_wer": 0.13655507285409274,
878
+ "step": 60000
879
+ },
880
+ {
881
+ "epoch": 2.83,
882
+ "learning_rate": 0.0002971927511584728,
883
+ "loss": 0.17,
884
+ "step": 60500
885
+ },
886
+ {
887
+ "epoch": 2.86,
888
+ "learning_rate": 0.00029716934766667183,
889
+ "loss": 0.1885,
890
+ "step": 61000
891
+ },
892
+ {
893
+ "epoch": 2.88,
894
+ "learning_rate": 0.0002971459441748709,
895
+ "loss": 0.1885,
896
+ "step": 61500
897
+ },
898
+ {
899
+ "epoch": 2.9,
900
+ "learning_rate": 0.0002971225874900535,
901
+ "loss": 0.1775,
902
+ "step": 62000
903
+ },
904
+ {
905
+ "epoch": 2.93,
906
+ "learning_rate": 0.0002970991839982525,
907
+ "loss": 0.184,
908
+ "step": 62500
909
+ },
910
+ {
911
+ "epoch": 2.95,
912
+ "learning_rate": 0.0002970757805064515,
913
+ "loss": 0.1865,
914
+ "step": 63000
915
+ },
916
+ {
917
+ "epoch": 2.97,
918
+ "learning_rate": 0.0002970523770146505,
919
+ "loss": 0.177,
920
+ "step": 63500
921
+ },
922
+ {
923
+ "epoch": 3.0,
924
+ "learning_rate": 0.0002970289735228496,
925
+ "loss": 0.1766,
926
+ "step": 64000
927
+ },
928
+ {
929
+ "epoch": 3.0,
930
+ "eval_cer": 0.043919453169305935,
931
+ "eval_loss": 0.14147880673408508,
932
+ "eval_runtime": 1043.0346,
933
+ "eval_samples_per_second": 18.209,
934
+ "eval_steps_per_second": 4.553,
935
+ "eval_wer": 0.14889323450313627,
936
+ "step": 64000
937
+ },
938
+ {
939
+ "epoch": 3.02,
940
+ "learning_rate": 0.0002970055700310486,
941
+ "loss": 0.1751,
942
+ "step": 64500
943
+ },
944
+ {
945
+ "epoch": 3.04,
946
+ "learning_rate": 0.00029698216653924767,
947
+ "loss": 0.16,
948
+ "step": 65000
949
+ },
950
+ {
951
+ "epoch": 3.07,
952
+ "learning_rate": 0.0002969587630474467,
953
+ "loss": 0.1654,
954
+ "step": 65500
955
+ },
956
+ {
957
+ "epoch": 3.09,
958
+ "learning_rate": 0.00029693540636262927,
959
+ "loss": 0.169,
960
+ "step": 66000
961
+ },
962
+ {
963
+ "epoch": 3.11,
964
+ "learning_rate": 0.0002969120028708283,
965
+ "loss": 0.1668,
966
+ "step": 66500
967
+ },
968
+ {
969
+ "epoch": 3.14,
970
+ "learning_rate": 0.0002968885993790273,
971
+ "loss": 0.1555,
972
+ "step": 67000
973
+ },
974
+ {
975
+ "epoch": 3.16,
976
+ "learning_rate": 0.00029686519588722636,
977
+ "loss": 0.1643,
978
+ "step": 67500
979
+ },
980
+ {
981
+ "epoch": 3.18,
982
+ "learning_rate": 0.00029684179239542537,
983
+ "loss": 0.1651,
984
+ "step": 68000
985
+ },
986
+ {
987
+ "epoch": 3.18,
988
+ "eval_cer": 0.04225832133664878,
989
+ "eval_loss": 0.14033427834510803,
990
+ "eval_runtime": 1043.9782,
991
+ "eval_samples_per_second": 18.193,
992
+ "eval_steps_per_second": 4.549,
993
+ "eval_wer": 0.1446074700328263,
994
+ "step": 68000
995
+ },
996
+ {
997
+ "epoch": 3.21,
998
+ "learning_rate": 0.0002968183889036244,
999
+ "loss": 0.181,
1000
+ "step": 68500
1001
+ },
1002
+ {
1003
+ "epoch": 3.23,
1004
+ "learning_rate": 0.00029679507902579067,
1005
+ "loss": 0.1691,
1006
+ "step": 69000
1007
+ },
1008
+ {
1009
+ "epoch": 3.25,
1010
+ "learning_rate": 0.0002967716755339897,
1011
+ "loss": 0.159,
1012
+ "step": 69500
1013
+ },
1014
+ {
1015
+ "epoch": 3.28,
1016
+ "learning_rate": 0.0002967482720421887,
1017
+ "loss": 0.171,
1018
+ "step": 70000
1019
+ },
1020
+ {
1021
+ "epoch": 3.3,
1022
+ "learning_rate": 0.0002967248685503877,
1023
+ "loss": 0.1529,
1024
+ "step": 70500
1025
+ },
1026
+ {
1027
+ "epoch": 3.32,
1028
+ "learning_rate": 0.0002967014650585867,
1029
+ "loss": 0.1714,
1030
+ "step": 71000
1031
+ },
1032
+ {
1033
+ "epoch": 3.35,
1034
+ "learning_rate": 0.0002966780615667857,
1035
+ "loss": 0.1646,
1036
+ "step": 71500
1037
+ },
1038
+ {
1039
+ "epoch": 3.37,
1040
+ "learning_rate": 0.0002966546580749848,
1041
+ "loss": 0.1648,
1042
+ "step": 72000
1043
+ },
1044
+ {
1045
+ "epoch": 3.37,
1046
+ "eval_cer": 0.04023797059721677,
1047
+ "eval_loss": 0.13524918258190155,
1048
+ "eval_runtime": 1043.0033,
1049
+ "eval_samples_per_second": 18.21,
1050
+ "eval_steps_per_second": 4.553,
1051
+ "eval_wer": 0.1384325343388096,
1052
+ "step": 72000
1053
+ },
1054
+ {
1055
+ "epoch": 3.39,
1056
+ "learning_rate": 0.0002966312545831838,
1057
+ "loss": 0.1855,
1058
+ "step": 72500
1059
+ },
1060
+ {
1061
+ "epoch": 3.42,
1062
+ "learning_rate": 0.00029660789789836645,
1063
+ "loss": 0.173,
1064
+ "step": 73000
1065
+ },
1066
+ {
1067
+ "epoch": 3.44,
1068
+ "learning_rate": 0.00029658449440656546,
1069
+ "loss": 0.1811,
1070
+ "step": 73500
1071
+ },
1072
+ {
1073
+ "epoch": 3.46,
1074
+ "learning_rate": 0.00029656109091476447,
1075
+ "loss": 0.1772,
1076
+ "step": 74000
1077
+ },
1078
+ {
1079
+ "epoch": 3.49,
1080
+ "learning_rate": 0.0002965376874229635,
1081
+ "loss": 0.1683,
1082
+ "step": 74500
1083
+ },
1084
+ {
1085
+ "epoch": 3.51,
1086
+ "learning_rate": 0.0002965142839311625,
1087
+ "loss": 0.1689,
1088
+ "step": 75000
1089
+ },
1090
+ {
1091
+ "epoch": 3.53,
1092
+ "learning_rate": 0.0002964908804393615,
1093
+ "loss": 0.1541,
1094
+ "step": 75500
1095
+ },
1096
+ {
1097
+ "epoch": 3.56,
1098
+ "learning_rate": 0.0002964675705615278,
1099
+ "loss": 0.1541,
1100
+ "step": 76000
1101
+ },
1102
+ {
1103
+ "epoch": 3.56,
1104
+ "eval_cer": 0.0414948959560267,
1105
+ "eval_loss": 0.13895024359226227,
1106
+ "eval_runtime": 1042.4783,
1107
+ "eval_samples_per_second": 18.219,
1108
+ "eval_steps_per_second": 4.555,
1109
+ "eval_wer": 0.1431398494127566,
1110
+ "step": 76000
1111
+ },
1112
+ {
1113
+ "epoch": 3.58,
1114
+ "learning_rate": 0.0002964441670697268,
1115
+ "loss": 0.1739,
1116
+ "step": 76500
1117
+ },
1118
+ {
1119
+ "epoch": 3.6,
1120
+ "learning_rate": 0.0002964207635779258,
1121
+ "loss": 0.1671,
1122
+ "step": 77000
1123
+ },
1124
+ {
1125
+ "epoch": 3.63,
1126
+ "learning_rate": 0.0002963973600861248,
1127
+ "loss": 0.168,
1128
+ "step": 77500
1129
+ },
1130
+ {
1131
+ "epoch": 3.65,
1132
+ "learning_rate": 0.00029637395659432383,
1133
+ "loss": 0.1611,
1134
+ "step": 78000
1135
+ },
1136
+ {
1137
+ "epoch": 3.67,
1138
+ "learning_rate": 0.00029635055310252284,
1139
+ "loss": 0.1654,
1140
+ "step": 78500
1141
+ },
1142
+ {
1143
+ "epoch": 3.7,
1144
+ "learning_rate": 0.0002963271964177055,
1145
+ "loss": 0.1714,
1146
+ "step": 79000
1147
+ },
1148
+ {
1149
+ "epoch": 3.72,
1150
+ "learning_rate": 0.0002963037929259045,
1151
+ "loss": 0.1735,
1152
+ "step": 79500
1153
+ },
1154
+ {
1155
+ "epoch": 3.74,
1156
+ "learning_rate": 0.00029628038943410357,
1157
+ "loss": 0.1634,
1158
+ "step": 80000
1159
+ },
1160
+ {
1161
+ "epoch": 3.74,
1162
+ "eval_cer": 0.04235443113903067,
1163
+ "eval_loss": 0.1428999900817871,
1164
+ "eval_runtime": 1074.2691,
1165
+ "eval_samples_per_second": 17.68,
1166
+ "eval_steps_per_second": 4.421,
1167
+ "eval_wer": 0.1467074165583516,
1168
+ "step": 80000
1169
+ },
1170
+ {
1171
+ "epoch": 3.77,
1172
+ "learning_rate": 0.0002962569859423026,
1173
+ "loss": 0.163,
1174
+ "step": 80500
1175
+ },
1176
+ {
1177
+ "epoch": 3.79,
1178
+ "learning_rate": 0.00029623362925748517,
1179
+ "loss": 0.1672,
1180
+ "step": 81000
1181
+ },
1182
+ {
1183
+ "epoch": 3.81,
1184
+ "learning_rate": 0.0002962102257656842,
1185
+ "loss": 0.1651,
1186
+ "step": 81500
1187
+ },
1188
+ {
1189
+ "epoch": 3.84,
1190
+ "learning_rate": 0.00029618682227388325,
1191
+ "loss": 0.1653,
1192
+ "step": 82000
1193
+ },
1194
+ {
1195
+ "epoch": 3.86,
1196
+ "learning_rate": 0.00029616341878208226,
1197
+ "loss": 0.163,
1198
+ "step": 82500
1199
+ },
1200
+ {
1201
+ "epoch": 3.88,
1202
+ "learning_rate": 0.00029614001529028127,
1203
+ "loss": 0.1681,
1204
+ "step": 83000
1205
+ },
1206
+ {
1207
+ "epoch": 3.91,
1208
+ "learning_rate": 0.00029611661179848033,
1209
+ "loss": 0.1608,
1210
+ "step": 83500
1211
+ },
1212
+ {
1213
+ "epoch": 3.93,
1214
+ "learning_rate": 0.00029609325511366293,
1215
+ "loss": 0.1649,
1216
+ "step": 84000
1217
+ },
1218
+ {
1219
+ "epoch": 3.93,
1220
+ "eval_cer": 0.03920121166513982,
1221
+ "eval_loss": 0.13187673687934875,
1222
+ "eval_runtime": 1043.8537,
1223
+ "eval_samples_per_second": 18.195,
1224
+ "eval_steps_per_second": 4.549,
1225
+ "eval_wer": 0.13773775649207445,
1226
+ "step": 84000
1227
+ },
1228
+ {
1229
+ "epoch": 3.95,
1230
+ "learning_rate": 0.00029606985162186194,
1231
+ "loss": 0.1629,
1232
+ "step": 84500
1233
+ },
1234
+ {
1235
+ "epoch": 3.98,
1236
+ "learning_rate": 0.00029604644813006095,
1237
+ "loss": 0.1671,
1238
+ "step": 85000
1239
+ },
1240
+ {
1241
+ "epoch": 4.0,
1242
+ "learning_rate": 0.00029602304463825996,
1243
+ "loss": 0.1693,
1244
+ "step": 85500
1245
+ },
1246
+ {
1247
+ "epoch": 4.03,
1248
+ "learning_rate": 0.000295999641146459,
1249
+ "loss": 0.1524,
1250
+ "step": 86000
1251
+ },
1252
+ {
1253
+ "epoch": 4.05,
1254
+ "learning_rate": 0.0002959762844616417,
1255
+ "loss": 0.1639,
1256
+ "step": 86500
1257
+ },
1258
+ {
1259
+ "epoch": 4.07,
1260
+ "learning_rate": 0.0002959528809698407,
1261
+ "loss": 0.1633,
1262
+ "step": 87000
1263
+ },
1264
+ {
1265
+ "epoch": 4.1,
1266
+ "learning_rate": 0.0002959294774780397,
1267
+ "loss": 0.1557,
1268
+ "step": 87500
1269
+ },
1270
+ {
1271
+ "epoch": 4.12,
1272
+ "learning_rate": 0.0002959060739862387,
1273
+ "loss": 0.1571,
1274
+ "step": 88000
1275
+ },
1276
+ {
1277
+ "epoch": 4.12,
1278
+ "eval_cer": 0.039078518300396985,
1279
+ "eval_loss": 0.1323617696762085,
1280
+ "eval_runtime": 1041.314,
1281
+ "eval_samples_per_second": 18.239,
1282
+ "eval_steps_per_second": 4.561,
1283
+ "eval_wer": 0.13669168647564178,
1284
+ "step": 88000
1285
+ },
1286
+ {
1287
+ "epoch": 4.14,
1288
+ "learning_rate": 0.0002958826704944377,
1289
+ "loss": 0.1626,
1290
+ "step": 88500
1291
+ },
1292
+ {
1293
+ "epoch": 4.17,
1294
+ "learning_rate": 0.00029585926700263673,
1295
+ "loss": 0.1515,
1296
+ "step": 89000
1297
+ },
1298
+ {
1299
+ "epoch": 4.19,
1300
+ "learning_rate": 0.0002958358635108358,
1301
+ "loss": 0.1445,
1302
+ "step": 89500
1303
+ },
1304
+ {
1305
+ "epoch": 4.21,
1306
+ "learning_rate": 0.0002958124600190348,
1307
+ "loss": 0.1554,
1308
+ "step": 90000
1309
+ },
1310
+ {
1311
+ "epoch": 4.24,
1312
+ "learning_rate": 0.00029578910333421745,
1313
+ "loss": 0.1559,
1314
+ "step": 90500
1315
+ },
1316
+ {
1317
+ "epoch": 4.26,
1318
+ "learning_rate": 0.00029576574664940005,
1319
+ "loss": 0.1716,
1320
+ "step": 91000
1321
+ },
1322
+ {
1323
+ "epoch": 4.28,
1324
+ "learning_rate": 0.0002957423431575991,
1325
+ "loss": 0.1524,
1326
+ "step": 91500
1327
+ },
1328
+ {
1329
+ "epoch": 4.31,
1330
+ "learning_rate": 0.0002957189396657981,
1331
+ "loss": 0.1483,
1332
+ "step": 92000
1333
+ },
1334
+ {
1335
+ "epoch": 4.31,
1336
+ "eval_cer": 0.03799609017144353,
1337
+ "eval_loss": 0.12687690556049347,
1338
+ "eval_runtime": 1041.0565,
1339
+ "eval_samples_per_second": 18.244,
1340
+ "eval_steps_per_second": 4.562,
1341
+ "eval_wer": 0.13329195892223564,
1342
+ "step": 92000
1343
+ },
1344
+ {
1345
+ "epoch": 4.33,
1346
+ "learning_rate": 0.00029569553617399714,
1347
+ "loss": 0.1623,
1348
+ "step": 92500
1349
+ },
1350
+ {
1351
+ "epoch": 4.35,
1352
+ "learning_rate": 0.00029567213268219615,
1353
+ "loss": 0.1499,
1354
+ "step": 93000
1355
+ },
1356
+ {
1357
+ "epoch": 4.38,
1358
+ "learning_rate": 0.0002956487759973788,
1359
+ "loss": 0.1474,
1360
+ "step": 93500
1361
+ },
1362
+ {
1363
+ "epoch": 4.4,
1364
+ "learning_rate": 0.0002956253725055778,
1365
+ "loss": 0.1519,
1366
+ "step": 94000
1367
+ },
1368
+ {
1369
+ "epoch": 4.42,
1370
+ "learning_rate": 0.0002956019690137768,
1371
+ "loss": 0.1567,
1372
+ "step": 94500
1373
+ },
1374
+ {
1375
+ "epoch": 4.45,
1376
+ "learning_rate": 0.0002955785655219759,
1377
+ "loss": 0.1596,
1378
+ "step": 95000
1379
+ },
1380
+ {
1381
+ "epoch": 4.47,
1382
+ "learning_rate": 0.0002955551620301749,
1383
+ "loss": 0.1524,
1384
+ "step": 95500
1385
+ },
1386
+ {
1387
+ "epoch": 4.49,
1388
+ "learning_rate": 0.0002955317585383739,
1389
+ "loss": 0.1483,
1390
+ "step": 96000
1391
+ },
1392
+ {
1393
+ "epoch": 4.49,
1394
+ "eval_cer": 0.03884608253718972,
1395
+ "eval_loss": 0.13216418027877808,
1396
+ "eval_runtime": 1043.5586,
1397
+ "eval_samples_per_second": 18.2,
1398
+ "eval_steps_per_second": 4.551,
1399
+ "eval_wer": 0.1352787113041917,
1400
+ "step": 96000
1401
+ },
1402
+ {
1403
+ "epoch": 4.52,
1404
+ "learning_rate": 0.0002955084018535565,
1405
+ "loss": 0.1509,
1406
+ "step": 96500
1407
+ },
1408
+ {
1409
+ "epoch": 4.54,
1410
+ "learning_rate": 0.0002954849983617555,
1411
+ "loss": 0.1553,
1412
+ "step": 97000
1413
+ },
1414
+ {
1415
+ "epoch": 4.56,
1416
+ "learning_rate": 0.0002954615948699546,
1417
+ "loss": 0.1576,
1418
+ "step": 97500
1419
+ },
1420
+ {
1421
+ "epoch": 4.59,
1422
+ "learning_rate": 0.0002954381913781536,
1423
+ "loss": 0.157,
1424
+ "step": 98000
1425
+ },
1426
+ {
1427
+ "epoch": 4.61,
1428
+ "learning_rate": 0.00029541478788635265,
1429
+ "loss": 0.1641,
1430
+ "step": 98500
1431
+ },
1432
+ {
1433
+ "epoch": 4.63,
1434
+ "learning_rate": 0.00029539138439455166,
1435
+ "loss": 0.1631,
1436
+ "step": 99000
1437
+ },
1438
+ {
1439
+ "epoch": 4.66,
1440
+ "learning_rate": 0.0002953679809027507,
1441
+ "loss": 0.1619,
1442
+ "step": 99500
1443
+ },
1444
+ {
1445
+ "epoch": 4.68,
1446
+ "learning_rate": 0.0002953445774109497,
1447
+ "loss": 0.1502,
1448
+ "step": 100000
1449
+ },
1450
+ {
1451
+ "epoch": 4.68,
1452
+ "eval_cer": 0.038923106705056054,
1453
+ "eval_loss": 0.12758338451385498,
1454
+ "eval_runtime": 1042.7138,
1455
+ "eval_samples_per_second": 18.215,
1456
+ "eval_steps_per_second": 4.554,
1457
+ "eval_wer": 0.13328415242957567,
1458
+ "step": 100000
1459
+ },
1460
+ {
1461
+ "epoch": 4.7,
1462
+ "learning_rate": 0.0002953212207261323,
1463
+ "loss": 0.1508,
1464
+ "step": 100500
1465
+ },
1466
+ {
1467
+ "epoch": 4.73,
1468
+ "learning_rate": 0.00029529781723433134,
1469
+ "loss": 0.1423,
1470
+ "step": 101000
1471
+ },
1472
+ {
1473
+ "epoch": 4.75,
1474
+ "learning_rate": 0.00029527441374253035,
1475
+ "loss": 0.1619,
1476
+ "step": 101500
1477
+ },
1478
+ {
1479
+ "epoch": 4.77,
1480
+ "learning_rate": 0.00029525101025072937,
1481
+ "loss": 0.1483,
1482
+ "step": 102000
1483
+ },
1484
+ {
1485
+ "epoch": 4.8,
1486
+ "learning_rate": 0.000295227653565912,
1487
+ "loss": 0.1716,
1488
+ "step": 102500
1489
+ },
1490
+ {
1491
+ "epoch": 4.82,
1492
+ "learning_rate": 0.000295204250074111,
1493
+ "loss": 0.1461,
1494
+ "step": 103000
1495
+ },
1496
+ {
1497
+ "epoch": 4.84,
1498
+ "learning_rate": 0.00029518084658231004,
1499
+ "loss": 0.1455,
1500
+ "step": 103500
1501
+ },
1502
+ {
1503
+ "epoch": 4.87,
1504
+ "learning_rate": 0.00029515744309050905,
1505
+ "loss": 0.1523,
1506
+ "step": 104000
1507
+ },
1508
+ {
1509
+ "epoch": 4.87,
1510
+ "eval_cer": 0.03784408672512324,
1511
+ "eval_loss": 0.1300228387117386,
1512
+ "eval_runtime": 1048.2511,
1513
+ "eval_samples_per_second": 18.119,
1514
+ "eval_steps_per_second": 4.53,
1515
+ "eval_wer": 0.12833483608317037,
1516
+ "step": 104000
1517
+ },
1518
+ {
1519
+ "epoch": 4.89,
1520
+ "learning_rate": 0.0002951340395987081,
1521
+ "loss": 0.1646,
1522
+ "step": 104500
1523
+ },
1524
+ {
1525
+ "epoch": 4.91,
1526
+ "learning_rate": 0.0002951106361069071,
1527
+ "loss": 0.1519,
1528
+ "step": 105000
1529
+ },
1530
+ {
1531
+ "epoch": 4.94,
1532
+ "learning_rate": 0.00029508727942208977,
1533
+ "loss": 0.1516,
1534
+ "step": 105500
1535
+ },
1536
+ {
1537
+ "epoch": 4.96,
1538
+ "learning_rate": 0.0002950638759302888,
1539
+ "loss": 0.1675,
1540
+ "step": 106000
1541
+ },
1542
+ {
1543
+ "epoch": 4.98,
1544
+ "learning_rate": 0.0002950404724384878,
1545
+ "loss": 0.1474,
1546
+ "step": 106500
1547
+ },
1548
+ {
1549
+ "epoch": 5.01,
1550
+ "learning_rate": 0.0002950170689466868,
1551
+ "loss": 0.151,
1552
+ "step": 107000
1553
+ },
1554
+ {
1555
+ "epoch": 5.03,
1556
+ "learning_rate": 0.0002949936654548858,
1557
+ "loss": 0.1419,
1558
+ "step": 107500
1559
+ },
1560
+ {
1561
+ "epoch": 5.05,
1562
+ "learning_rate": 0.00029497030877006846,
1563
+ "loss": 0.139,
1564
+ "step": 108000
1565
+ },
1566
+ {
1567
+ "epoch": 5.05,
1568
+ "eval_cer": 0.03710792653666623,
1569
+ "eval_loss": 0.1256220042705536,
1570
+ "eval_runtime": 1098.0599,
1571
+ "eval_samples_per_second": 17.297,
1572
+ "eval_steps_per_second": 4.325,
1573
+ "eval_wer": 0.12972048853031065,
1574
+ "step": 108000
1575
+ },
1576
+ {
1577
+ "epoch": 5.08,
1578
+ "learning_rate": 0.0002949469052782675,
1579
+ "loss": 0.137,
1580
+ "step": 108500
1581
+ },
1582
+ {
1583
+ "epoch": 5.1,
1584
+ "learning_rate": 0.0002949235017864665,
1585
+ "loss": 0.141,
1586
+ "step": 109000
1587
+ },
1588
+ {
1589
+ "epoch": 5.12,
1590
+ "learning_rate": 0.00029490009829466555,
1591
+ "loss": 0.1528,
1592
+ "step": 109500
1593
+ },
1594
+ {
1595
+ "epoch": 5.15,
1596
+ "learning_rate": 0.0002948767416098482,
1597
+ "loss": 0.1426,
1598
+ "step": 110000
1599
+ },
1600
+ {
1601
+ "epoch": 5.17,
1602
+ "learning_rate": 0.0002948533381180472,
1603
+ "loss": 0.1375,
1604
+ "step": 110500
1605
+ },
1606
+ {
1607
+ "epoch": 5.2,
1608
+ "learning_rate": 0.0002948299346262462,
1609
+ "loss": 0.1413,
1610
+ "step": 111000
1611
+ },
1612
+ {
1613
+ "epoch": 5.22,
1614
+ "learning_rate": 0.00029480653113444523,
1615
+ "loss": 0.1322,
1616
+ "step": 111500
1617
+ },
1618
+ {
1619
+ "epoch": 5.24,
1620
+ "learning_rate": 0.00029478312764264424,
1621
+ "loss": 0.1444,
1622
+ "step": 112000
1623
+ },
1624
+ {
1625
+ "epoch": 5.24,
1626
+ "eval_cer": 0.03732468481437857,
1627
+ "eval_loss": 0.12207575142383575,
1628
+ "eval_runtime": 1054.5242,
1629
+ "eval_samples_per_second": 18.011,
1630
+ "eval_steps_per_second": 4.503,
1631
+ "eval_wer": 0.1293418736363033,
1632
+ "step": 112000
1633
+ },
1634
+ {
1635
+ "epoch": 5.27,
1636
+ "learning_rate": 0.00029475972415084325,
1637
+ "loss": 0.1512,
1638
+ "step": 112500
1639
+ },
1640
+ {
1641
+ "epoch": 5.29,
1642
+ "learning_rate": 0.0002947363206590423,
1643
+ "loss": 0.1393,
1644
+ "step": 113000
1645
+ },
1646
+ {
1647
+ "epoch": 5.31,
1648
+ "learning_rate": 0.00029471291716724133,
1649
+ "loss": 0.1487,
1650
+ "step": 113500
1651
+ },
1652
+ {
1653
+ "epoch": 5.34,
1654
+ "learning_rate": 0.00029468951367544034,
1655
+ "loss": 0.1493,
1656
+ "step": 114000
1657
+ },
1658
+ {
1659
+ "epoch": 5.36,
1660
+ "learning_rate": 0.0002946661101836394,
1661
+ "loss": 0.151,
1662
+ "step": 114500
1663
+ },
1664
+ {
1665
+ "epoch": 5.38,
1666
+ "learning_rate": 0.0002946427066918384,
1667
+ "loss": 0.1329,
1668
+ "step": 115000
1669
+ },
1670
+ {
1671
+ "epoch": 5.41,
1672
+ "learning_rate": 0.00029461930320003743,
1673
+ "loss": 0.1492,
1674
+ "step": 115500
1675
+ },
1676
+ {
1677
+ "epoch": 5.43,
1678
+ "learning_rate": 0.00029459589970823644,
1679
+ "loss": 0.1411,
1680
+ "step": 116000
1681
+ },
1682
+ {
1683
+ "epoch": 5.43,
1684
+ "eval_cer": 0.037894527330628626,
1685
+ "eval_loss": 0.12310981005430222,
1686
+ "eval_runtime": 1041.1085,
1687
+ "eval_samples_per_second": 18.243,
1688
+ "eval_steps_per_second": 4.561,
1689
+ "eval_wer": 0.13332708813920538,
1690
+ "step": 116000
1691
+ },
1692
+ {
1693
+ "epoch": 5.45,
1694
+ "learning_rate": 0.00029457258983040267,
1695
+ "loss": 0.1431,
1696
+ "step": 116500
1697
+ },
1698
+ {
1699
+ "epoch": 5.48,
1700
+ "learning_rate": 0.0002945491863386017,
1701
+ "loss": 0.1405,
1702
+ "step": 117000
1703
+ },
1704
+ {
1705
+ "epoch": 5.5,
1706
+ "learning_rate": 0.00029452578284680075,
1707
+ "loss": 0.1388,
1708
+ "step": 117500
1709
+ },
1710
+ {
1711
+ "epoch": 5.52,
1712
+ "learning_rate": 0.00029450237935499976,
1713
+ "loss": 0.1478,
1714
+ "step": 118000
1715
+ },
1716
+ {
1717
+ "epoch": 5.55,
1718
+ "learning_rate": 0.00029447897586319877,
1719
+ "loss": 0.1532,
1720
+ "step": 118500
1721
+ },
1722
+ {
1723
+ "epoch": 5.57,
1724
+ "learning_rate": 0.00029445561917838136,
1725
+ "loss": 0.1456,
1726
+ "step": 119000
1727
+ },
1728
+ {
1729
+ "epoch": 5.59,
1730
+ "learning_rate": 0.000294432262493564,
1731
+ "loss": 0.1511,
1732
+ "step": 119500
1733
+ },
1734
+ {
1735
+ "epoch": 5.62,
1736
+ "learning_rate": 0.000294408859001763,
1737
+ "loss": 0.1457,
1738
+ "step": 120000
1739
+ },
1740
+ {
1741
+ "epoch": 5.62,
1742
+ "eval_cer": 0.03661783470749902,
1743
+ "eval_loss": 0.12026005238294601,
1744
+ "eval_runtime": 1041.1064,
1745
+ "eval_samples_per_second": 18.243,
1746
+ "eval_steps_per_second": 4.561,
1747
+ "eval_wer": 0.12665253691495218,
1748
+ "step": 120000
1749
+ },
1750
+ {
1751
+ "epoch": 5.64,
1752
+ "learning_rate": 0.0002943854555099621,
1753
+ "loss": 0.1425,
1754
+ "step": 120500
1755
+ },
1756
+ {
1757
+ "epoch": 5.66,
1758
+ "learning_rate": 0.0002943620520181611,
1759
+ "loss": 0.1468,
1760
+ "step": 121000
1761
+ },
1762
+ {
1763
+ "epoch": 5.69,
1764
+ "learning_rate": 0.0002943386485263601,
1765
+ "loss": 0.1438,
1766
+ "step": 121500
1767
+ },
1768
+ {
1769
+ "epoch": 5.71,
1770
+ "learning_rate": 0.0002943152450345591,
1771
+ "loss": 0.1495,
1772
+ "step": 122000
1773
+ },
1774
+ {
1775
+ "epoch": 5.73,
1776
+ "learning_rate": 0.00029429184154275813,
1777
+ "loss": 0.1407,
1778
+ "step": 122500
1779
+ },
1780
+ {
1781
+ "epoch": 5.76,
1782
+ "learning_rate": 0.00029426843805095714,
1783
+ "loss": 0.1486,
1784
+ "step": 123000
1785
+ },
1786
+ {
1787
+ "epoch": 5.78,
1788
+ "learning_rate": 0.0002942450345591562,
1789
+ "loss": 0.1421,
1790
+ "step": 123500
1791
+ },
1792
+ {
1793
+ "epoch": 5.8,
1794
+ "learning_rate": 0.0002942216310673552,
1795
+ "loss": 0.1458,
1796
+ "step": 124000
1797
+ },
1798
+ {
1799
+ "epoch": 5.8,
1800
+ "eval_cer": 0.037944967936134014,
1801
+ "eval_loss": 0.12521982192993164,
1802
+ "eval_runtime": 1041.2886,
1803
+ "eval_samples_per_second": 18.24,
1804
+ "eval_steps_per_second": 4.561,
1805
+ "eval_wer": 0.1334519920217645,
1806
+ "step": 124000
1807
+ },
1808
+ {
1809
+ "epoch": 5.83,
1810
+ "learning_rate": 0.00029419827438253787,
1811
+ "loss": 0.1477,
1812
+ "step": 124500
1813
+ },
1814
+ {
1815
+ "epoch": 5.85,
1816
+ "learning_rate": 0.0002941748708907369,
1817
+ "loss": 0.1432,
1818
+ "step": 125000
1819
+ },
1820
+ {
1821
+ "epoch": 5.87,
1822
+ "learning_rate": 0.0002941514673989359,
1823
+ "loss": 0.1525,
1824
+ "step": 125500
1825
+ },
1826
+ {
1827
+ "epoch": 5.9,
1828
+ "learning_rate": 0.0002941280639071349,
1829
+ "loss": 0.1409,
1830
+ "step": 126000
1831
+ },
1832
+ {
1833
+ "epoch": 5.92,
1834
+ "learning_rate": 0.0002941046604153339,
1835
+ "loss": 0.1485,
1836
+ "step": 126500
1837
+ },
1838
+ {
1839
+ "epoch": 5.94,
1840
+ "learning_rate": 0.000294081256923533,
1841
+ "loss": 0.1422,
1842
+ "step": 127000
1843
+ },
1844
+ {
1845
+ "epoch": 5.97,
1846
+ "learning_rate": 0.000294057853431732,
1847
+ "loss": 0.1438,
1848
+ "step": 127500
1849
+ },
1850
+ {
1851
+ "epoch": 5.99,
1852
+ "learning_rate": 0.00029403449674691464,
1853
+ "loss": 0.1376,
1854
+ "step": 128000
1855
+ },
1856
+ {
1857
+ "epoch": 5.99,
1858
+ "eval_cer": 0.03670849147144789,
1859
+ "eval_loss": 0.12269050627946854,
1860
+ "eval_runtime": 1041.8258,
1861
+ "eval_samples_per_second": 18.23,
1862
+ "eval_steps_per_second": 4.558,
1863
+ "eval_wer": 0.1273629277470072,
1864
+ "step": 128000
1865
+ },
1866
+ {
1867
+ "epoch": 6.01,
1868
+ "learning_rate": 0.00029401109325511365,
1869
+ "loss": 0.1396,
1870
+ "step": 128500
1871
+ },
1872
+ {
1873
+ "epoch": 6.04,
1874
+ "learning_rate": 0.00029398768976331266,
1875
+ "loss": 0.1314,
1876
+ "step": 129000
1877
+ },
1878
+ {
1879
+ "epoch": 6.06,
1880
+ "learning_rate": 0.00029396428627151167,
1881
+ "loss": 0.1401,
1882
+ "step": 129500
1883
+ },
1884
+ {
1885
+ "epoch": 6.08,
1886
+ "learning_rate": 0.0002939408827797107,
1887
+ "loss": 0.1277,
1888
+ "step": 130000
1889
+ },
1890
+ {
1891
+ "epoch": 6.11,
1892
+ "learning_rate": 0.0002939174792879097,
1893
+ "loss": 0.1412,
1894
+ "step": 130500
1895
+ },
1896
+ {
1897
+ "epoch": 6.13,
1898
+ "learning_rate": 0.00029389407579610876,
1899
+ "loss": 0.1353,
1900
+ "step": 131000
1901
+ },
1902
+ {
1903
+ "epoch": 6.15,
1904
+ "learning_rate": 0.00029387071911129135,
1905
+ "loss": 0.1349,
1906
+ "step": 131500
1907
+ },
1908
+ {
1909
+ "epoch": 6.18,
1910
+ "learning_rate": 0.0002938473156194904,
1911
+ "loss": 0.1338,
1912
+ "step": 132000
1913
+ },
1914
+ {
1915
+ "epoch": 6.18,
1916
+ "eval_cer": 0.036872082624438335,
1917
+ "eval_loss": 0.1331152617931366,
1918
+ "eval_runtime": 1042.3485,
1919
+ "eval_samples_per_second": 18.221,
1920
+ "eval_steps_per_second": 4.556,
1921
+ "eval_wer": 0.1286002568336085,
1922
+ "step": 132000
1923
+ },
1924
+ {
1925
+ "epoch": 6.2,
1926
+ "learning_rate": 0.00029382391212768943,
1927
+ "loss": 0.1446,
1928
+ "step": 132500
1929
+ },
1930
+ {
1931
+ "epoch": 6.22,
1932
+ "learning_rate": 0.00029380050863588844,
1933
+ "loss": 0.1297,
1934
+ "step": 133000
1935
+ },
1936
+ {
1937
+ "epoch": 6.25,
1938
+ "learning_rate": 0.00029377710514408745,
1939
+ "loss": 0.1344,
1940
+ "step": 133500
1941
+ },
1942
+ {
1943
+ "epoch": 6.27,
1944
+ "learning_rate": 0.0002937537484592701,
1945
+ "loss": 0.1349,
1946
+ "step": 134000
1947
+ },
1948
+ {
1949
+ "epoch": 6.3,
1950
+ "learning_rate": 0.0002937303449674691,
1951
+ "loss": 0.1249,
1952
+ "step": 134500
1953
+ },
1954
+ {
1955
+ "epoch": 6.32,
1956
+ "learning_rate": 0.0002937069414756681,
1957
+ "loss": 0.1358,
1958
+ "step": 135000
1959
+ },
1960
+ {
1961
+ "epoch": 6.34,
1962
+ "learning_rate": 0.0002936835379838672,
1963
+ "loss": 0.1405,
1964
+ "step": 135500
1965
+ },
1966
+ {
1967
+ "epoch": 6.37,
1968
+ "learning_rate": 0.0002936601344920662,
1969
+ "loss": 0.1212,
1970
+ "step": 136000
1971
+ },
1972
+ {
1973
+ "epoch": 6.37,
1974
+ "eval_cer": 0.036793013567159624,
1975
+ "eval_loss": 0.12496736645698547,
1976
+ "eval_runtime": 1043.01,
1977
+ "eval_samples_per_second": 18.21,
1978
+ "eval_steps_per_second": 4.553,
1979
+ "eval_wer": 0.12528640069946173,
1980
+ "step": 136000
1981
+ },
1982
+ {
1983
+ "epoch": 6.39,
1984
+ "learning_rate": 0.00029363677780724884,
1985
+ "loss": 0.1332,
1986
+ "step": 136500
1987
+ },
1988
+ {
1989
+ "epoch": 6.41,
1990
+ "learning_rate": 0.00029361337431544786,
1991
+ "loss": 0.1396,
1992
+ "step": 137000
1993
+ },
1994
+ {
1995
+ "epoch": 6.44,
1996
+ "learning_rate": 0.00029358997082364687,
1997
+ "loss": 0.1304,
1998
+ "step": 137500
1999
+ },
2000
+ {
2001
+ "epoch": 6.46,
2002
+ "learning_rate": 0.0002935665673318459,
2003
+ "loss": 0.1303,
2004
+ "step": 138000
2005
+ },
2006
+ {
2007
+ "epoch": 6.48,
2008
+ "learning_rate": 0.0002935431638400449,
2009
+ "loss": 0.1356,
2010
+ "step": 138500
2011
+ },
2012
+ {
2013
+ "epoch": 6.51,
2014
+ "learning_rate": 0.00029351980715522754,
2015
+ "loss": 0.148,
2016
+ "step": 139000
2017
+ },
2018
+ {
2019
+ "epoch": 6.53,
2020
+ "learning_rate": 0.00029349640366342655,
2021
+ "loss": 0.1371,
2022
+ "step": 139500
2023
+ },
2024
+ {
2025
+ "epoch": 6.55,
2026
+ "learning_rate": 0.0002934730001716256,
2027
+ "loss": 0.1336,
2028
+ "step": 140000
2029
+ },
2030
+ {
2031
+ "epoch": 6.55,
2032
+ "eval_cer": 0.03752712886620425,
2033
+ "eval_loss": 0.12814708054065704,
2034
+ "eval_runtime": 1042.3272,
2035
+ "eval_samples_per_second": 18.222,
2036
+ "eval_steps_per_second": 4.556,
2037
+ "eval_wer": 0.13243324472964163,
2038
+ "step": 140000
2039
+ },
2040
+ {
2041
+ "epoch": 6.58,
2042
+ "learning_rate": 0.0002934495966798246,
2043
+ "loss": 0.1489,
2044
+ "step": 140500
2045
+ },
2046
+ {
2047
+ "epoch": 6.6,
2048
+ "learning_rate": 0.00029342619318802364,
2049
+ "loss": 0.1354,
2050
+ "step": 141000
2051
+ },
2052
+ {
2053
+ "epoch": 6.62,
2054
+ "learning_rate": 0.00029340278969622265,
2055
+ "loss": 0.1341,
2056
+ "step": 141500
2057
+ },
2058
+ {
2059
+ "epoch": 6.65,
2060
+ "learning_rate": 0.00029337943301140524,
2061
+ "loss": 0.1403,
2062
+ "step": 142000
2063
+ },
2064
+ {
2065
+ "epoch": 6.67,
2066
+ "learning_rate": 0.0002933560295196043,
2067
+ "loss": 0.1369,
2068
+ "step": 142500
2069
+ },
2070
+ {
2071
+ "epoch": 6.69,
2072
+ "learning_rate": 0.0002933326260278033,
2073
+ "loss": 0.1439,
2074
+ "step": 143000
2075
+ },
2076
+ {
2077
+ "epoch": 6.72,
2078
+ "learning_rate": 0.00029330926934298596,
2079
+ "loss": 0.1349,
2080
+ "step": 143500
2081
+ },
2082
+ {
2083
+ "epoch": 6.74,
2084
+ "learning_rate": 0.000293285865851185,
2085
+ "loss": 0.1345,
2086
+ "step": 144000
2087
+ },
2088
+ {
2089
+ "epoch": 6.74,
2090
+ "eval_cer": 0.03716995484884177,
2091
+ "eval_loss": 0.1245645210146904,
2092
+ "eval_runtime": 1041.9872,
2093
+ "eval_samples_per_second": 18.228,
2094
+ "eval_steps_per_second": 4.558,
2095
+ "eval_wer": 0.13216001748654355,
2096
+ "step": 144000
2097
+ },
2098
+ {
2099
+ "epoch": 6.76,
2100
+ "learning_rate": 0.000293262462359384,
2101
+ "loss": 0.1345,
2102
+ "step": 144500
2103
+ },
2104
+ {
2105
+ "epoch": 6.79,
2106
+ "learning_rate": 0.000293239058867583,
2107
+ "loss": 0.1392,
2108
+ "step": 145000
2109
+ },
2110
+ {
2111
+ "epoch": 6.81,
2112
+ "learning_rate": 0.00029321570218276565,
2113
+ "loss": 0.1437,
2114
+ "step": 145500
2115
+ },
2116
+ {
2117
+ "epoch": 6.83,
2118
+ "learning_rate": 0.00029319229869096466,
2119
+ "loss": 0.1481,
2120
+ "step": 146000
2121
+ },
2122
+ {
2123
+ "epoch": 6.86,
2124
+ "learning_rate": 0.00029316889519916367,
2125
+ "loss": 0.1301,
2126
+ "step": 146500
2127
+ },
2128
+ {
2129
+ "epoch": 6.88,
2130
+ "learning_rate": 0.00029314549170736273,
2131
+ "loss": 0.1358,
2132
+ "step": 147000
2133
+ },
2134
+ {
2135
+ "epoch": 6.9,
2136
+ "learning_rate": 0.00029312208821556174,
2137
+ "loss": 0.1284,
2138
+ "step": 147500
2139
+ },
2140
+ {
2141
+ "epoch": 6.93,
2142
+ "learning_rate": 0.00029309868472376076,
2143
+ "loss": 0.1389,
2144
+ "step": 148000
2145
+ },
2146
+ {
2147
+ "epoch": 6.93,
2148
+ "eval_cer": 0.03524298739257514,
2149
+ "eval_loss": 0.12240828573703766,
2150
+ "eval_runtime": 1040.9111,
2151
+ "eval_samples_per_second": 18.247,
2152
+ "eval_steps_per_second": 4.562,
2153
+ "eval_wer": 0.1233660035051152,
2154
+ "step": 148000
2155
+ },
2156
+ {
2157
+ "epoch": 6.95,
2158
+ "learning_rate": 0.0002930753280389434,
2159
+ "loss": 0.1344,
2160
+ "step": 148500
2161
+ },
2162
+ {
2163
+ "epoch": 6.97,
2164
+ "learning_rate": 0.0002930519245471424,
2165
+ "loss": 0.1379,
2166
+ "step": 149000
2167
+ },
2168
+ {
2169
+ "epoch": 7.0,
2170
+ "learning_rate": 0.0002930285210553414,
2171
+ "loss": 0.1259,
2172
+ "step": 149500
2173
+ },
2174
+ {
2175
+ "epoch": 7.02,
2176
+ "learning_rate": 0.00029300511756354044,
2177
+ "loss": 0.1204,
2178
+ "step": 150000
2179
+ },
2180
+ {
2181
+ "epoch": 7.04,
2182
+ "learning_rate": 0.0002929817140717395,
2183
+ "loss": 0.1293,
2184
+ "step": 150500
2185
+ },
2186
+ {
2187
+ "epoch": 7.07,
2188
+ "learning_rate": 0.0002929583105799385,
2189
+ "loss": 0.1257,
2190
+ "step": 151000
2191
+ },
2192
+ {
2193
+ "epoch": 7.09,
2194
+ "learning_rate": 0.0002929349070881375,
2195
+ "loss": 0.1279,
2196
+ "step": 151500
2197
+ },
2198
+ {
2199
+ "epoch": 7.11,
2200
+ "learning_rate": 0.00029291150359633654,
2201
+ "loss": 0.126,
2202
+ "step": 152000
2203
+ },
2204
+ {
2205
+ "epoch": 7.11,
2206
+ "eval_cer": 0.03576238930331981,
2207
+ "eval_loss": 0.12034807354211807,
2208
+ "eval_runtime": 1041.2962,
2209
+ "eval_samples_per_second": 18.24,
2210
+ "eval_steps_per_second": 4.561,
2211
+ "eval_wer": 0.12211696467952396,
2212
+ "step": 152000
2213
+ },
2214
+ {
2215
+ "epoch": 7.14,
2216
+ "learning_rate": 0.0002928881469115192,
2217
+ "loss": 0.1337,
2218
+ "step": 152500
2219
+ },
2220
+ {
2221
+ "epoch": 7.16,
2222
+ "learning_rate": 0.0002928647434197182,
2223
+ "loss": 0.1237,
2224
+ "step": 153000
2225
+ },
2226
+ {
2227
+ "epoch": 7.18,
2228
+ "learning_rate": 0.0002928413399279172,
2229
+ "loss": 0.1244,
2230
+ "step": 153500
2231
+ },
2232
+ {
2233
+ "epoch": 7.21,
2234
+ "learning_rate": 0.0002928179364361162,
2235
+ "loss": 0.1335,
2236
+ "step": 154000
2237
+ },
2238
+ {
2239
+ "epoch": 7.23,
2240
+ "learning_rate": 0.0002927945329443153,
2241
+ "loss": 0.126,
2242
+ "step": 154500
2243
+ },
2244
+ {
2245
+ "epoch": 7.25,
2246
+ "learning_rate": 0.0002927711762594979,
2247
+ "loss": 0.1205,
2248
+ "step": 155000
2249
+ },
2250
+ {
2251
+ "epoch": 7.28,
2252
+ "learning_rate": 0.00029274777276769694,
2253
+ "loss": 0.1296,
2254
+ "step": 155500
2255
+ },
2256
+ {
2257
+ "epoch": 7.3,
2258
+ "learning_rate": 0.00029272436927589595,
2259
+ "loss": 0.1197,
2260
+ "step": 156000
2261
+ },
2262
+ {
2263
+ "epoch": 7.3,
2264
+ "eval_cer": 0.03777728700431881,
2265
+ "eval_loss": 0.1192484200000763,
2266
+ "eval_runtime": 1041.7692,
2267
+ "eval_samples_per_second": 18.231,
2268
+ "eval_steps_per_second": 4.559,
2269
+ "eval_wer": 0.12780789782862406,
2270
+ "step": 156000
2271
+ },
2272
+ {
2273
+ "epoch": 7.32,
2274
+ "learning_rate": 0.00029270096578409496,
2275
+ "loss": 0.1231,
2276
+ "step": 156500
2277
+ },
2278
+ {
2279
+ "epoch": 7.35,
2280
+ "learning_rate": 0.000292677562292294,
2281
+ "loss": 0.1209,
2282
+ "step": 157000
2283
+ },
2284
+ {
2285
+ "epoch": 7.37,
2286
+ "learning_rate": 0.000292654158800493,
2287
+ "loss": 0.1332,
2288
+ "step": 157500
2289
+ },
2290
+ {
2291
+ "epoch": 7.39,
2292
+ "learning_rate": 0.0002926308489226592,
2293
+ "loss": 0.1283,
2294
+ "step": 158000
2295
+ },
2296
+ {
2297
+ "epoch": 7.42,
2298
+ "learning_rate": 0.0002926074454308583,
2299
+ "loss": 0.1411,
2300
+ "step": 158500
2301
+ },
2302
+ {
2303
+ "epoch": 7.44,
2304
+ "learning_rate": 0.0002925840419390573,
2305
+ "loss": 0.1318,
2306
+ "step": 159000
2307
+ },
2308
+ {
2309
+ "epoch": 7.47,
2310
+ "learning_rate": 0.0002925606384472563,
2311
+ "loss": 0.1259,
2312
+ "step": 159500
2313
+ },
2314
+ {
2315
+ "epoch": 7.49,
2316
+ "learning_rate": 0.0002925372349554553,
2317
+ "loss": 0.1309,
2318
+ "step": 160000
2319
+ },
2320
+ {
2321
+ "epoch": 7.49,
2322
+ "eval_cer": 0.035546994285215724,
2323
+ "eval_loss": 0.11985628306865692,
2324
+ "eval_runtime": 1041.0069,
2325
+ "eval_samples_per_second": 18.245,
2326
+ "eval_steps_per_second": 4.562,
2327
+ "eval_wer": 0.12378755410875225,
2328
+ "step": 160000
2329
+ },
2330
+ {
2331
+ "epoch": 7.51,
2332
+ "learning_rate": 0.0002925138314636543,
2333
+ "loss": 0.1237,
2334
+ "step": 160500
2335
+ },
2336
+ {
2337
+ "epoch": 7.54,
2338
+ "learning_rate": 0.00029249042797185334,
2339
+ "loss": 0.1363,
2340
+ "step": 161000
2341
+ },
2342
+ {
2343
+ "epoch": 7.56,
2344
+ "learning_rate": 0.0002924670244800524,
2345
+ "loss": 0.1275,
2346
+ "step": 161500
2347
+ },
2348
+ {
2349
+ "epoch": 7.58,
2350
+ "learning_rate": 0.0002924436209882514,
2351
+ "loss": 0.1285,
2352
+ "step": 162000
2353
+ },
2354
+ {
2355
+ "epoch": 7.61,
2356
+ "learning_rate": 0.0002924202174964505,
2357
+ "loss": 0.1453,
2358
+ "step": 162500
2359
+ },
2360
+ {
2361
+ "epoch": 7.63,
2362
+ "learning_rate": 0.0002923968140046495,
2363
+ "loss": 0.1273,
2364
+ "step": 163000
2365
+ },
2366
+ {
2367
+ "epoch": 7.65,
2368
+ "learning_rate": 0.0002923734105128485,
2369
+ "loss": 0.1352,
2370
+ "step": 163500
2371
+ },
2372
+ {
2373
+ "epoch": 7.68,
2374
+ "learning_rate": 0.0002923500070210475,
2375
+ "loss": 0.127,
2376
+ "step": 164000
2377
+ },
2378
+ {
2379
+ "epoch": 7.68,
2380
+ "eval_cer": 0.03588303777865026,
2381
+ "eval_loss": 0.12572461366653442,
2382
+ "eval_runtime": 1040.9162,
2383
+ "eval_samples_per_second": 18.246,
2384
+ "eval_steps_per_second": 4.562,
2385
+ "eval_wer": 0.1238812320206716,
2386
+ "step": 164000
2387
+ },
2388
+ {
2389
+ "epoch": 7.7,
2390
+ "learning_rate": 0.0002923266503362301,
2391
+ "loss": 0.1382,
2392
+ "step": 164500
2393
+ },
2394
+ {
2395
+ "epoch": 7.72,
2396
+ "learning_rate": 0.00029230324684442917,
2397
+ "loss": 0.1312,
2398
+ "step": 165000
2399
+ },
2400
+ {
2401
+ "epoch": 7.75,
2402
+ "learning_rate": 0.0002922798433526282,
2403
+ "loss": 0.1342,
2404
+ "step": 165500
2405
+ },
2406
+ {
2407
+ "epoch": 7.77,
2408
+ "learning_rate": 0.00029225648666781083,
2409
+ "loss": 0.1442,
2410
+ "step": 166000
2411
+ },
2412
+ {
2413
+ "epoch": 7.79,
2414
+ "learning_rate": 0.00029223308317600984,
2415
+ "loss": 0.1253,
2416
+ "step": 166500
2417
+ },
2418
+ {
2419
+ "epoch": 7.82,
2420
+ "learning_rate": 0.00029220967968420885,
2421
+ "loss": 0.1321,
2422
+ "step": 167000
2423
+ },
2424
+ {
2425
+ "epoch": 7.84,
2426
+ "learning_rate": 0.00029218627619240786,
2427
+ "loss": 0.1337,
2428
+ "step": 167500
2429
+ },
2430
+ {
2431
+ "epoch": 7.86,
2432
+ "learning_rate": 0.0002921628727006069,
2433
+ "loss": 0.1233,
2434
+ "step": 168000
2435
+ },
2436
+ {
2437
+ "epoch": 7.86,
2438
+ "eval_cer": 0.03597301291279501,
2439
+ "eval_loss": 0.12973648309707642,
2440
+ "eval_runtime": 1040.5606,
2441
+ "eval_samples_per_second": 18.253,
2442
+ "eval_steps_per_second": 4.564,
2443
+ "eval_wer": 0.1249936572247138,
2444
+ "step": 168000
2445
+ },
2446
+ {
2447
+ "epoch": 7.89,
2448
+ "learning_rate": 0.00029213946920880594,
2449
+ "loss": 0.1334,
2450
+ "step": 168500
2451
+ },
2452
+ {
2453
+ "epoch": 7.91,
2454
+ "learning_rate": 0.00029211606571700495,
2455
+ "loss": 0.1323,
2456
+ "step": 169000
2457
+ },
2458
+ {
2459
+ "epoch": 7.93,
2460
+ "learning_rate": 0.0002920927090321876,
2461
+ "loss": 0.1329,
2462
+ "step": 169500
2463
+ },
2464
+ {
2465
+ "epoch": 7.96,
2466
+ "learning_rate": 0.0002920693055403866,
2467
+ "loss": 0.1306,
2468
+ "step": 170000
2469
+ },
2470
+ {
2471
+ "epoch": 7.98,
2472
+ "learning_rate": 0.0002920459020485856,
2473
+ "loss": 0.1397,
2474
+ "step": 170500
2475
+ },
2476
+ {
2477
+ "epoch": 8.0,
2478
+ "learning_rate": 0.00029202249855678463,
2479
+ "loss": 0.1303,
2480
+ "step": 171000
2481
+ },
2482
+ {
2483
+ "epoch": 8.03,
2484
+ "learning_rate": 0.0002919991418719673,
2485
+ "loss": 0.1263,
2486
+ "step": 171500
2487
+ },
2488
+ {
2489
+ "epoch": 8.05,
2490
+ "learning_rate": 0.0002919757383801663,
2491
+ "loss": 0.1222,
2492
+ "step": 172000
2493
+ },
2494
+ {
2495
+ "epoch": 8.05,
2496
+ "eval_cer": 0.033454390786546266,
2497
+ "eval_loss": 0.12292832136154175,
2498
+ "eval_runtime": 1040.6782,
2499
+ "eval_samples_per_second": 18.251,
2500
+ "eval_steps_per_second": 4.563,
2501
+ "eval_wer": 0.11680074317810123,
2502
+ "step": 172000
2503
+ }
2504
+ ],
2505
+ "max_steps": 6409800,
2506
+ "num_train_epochs": 300,
2507
+ "total_flos": 2.798604341991718e+20,
2508
+ "trial_name": null,
2509
+ "trial_params": null
2510
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bdcd370ad6ba8aeb168c02e43e1c989d05daa88726c020aab5c950a5e5e3ede0
3
+ size 3887
vocab.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "'": 1,
3
+ "-": 2,
4
+ "[PAD]": 43,
5
+ "[UNK]": 42,
6
+ "a": 3,
7
+ "b": 4,
8
+ "c": 5,
9
+ "d": 6,
10
+ "e": 7,
11
+ "f": 8,
12
+ "g": 9,
13
+ "h": 10,
14
+ "i": 11,
15
+ "j": 12,
16
+ "k": 13,
17
+ "l": 14,
18
+ "m": 15,
19
+ "n": 16,
20
+ "o": 17,
21
+ "p": 18,
22
+ "q": 19,
23
+ "r": 20,
24
+ "s": 21,
25
+ "t": 22,
26
+ "u": 23,
27
+ "v": 24,
28
+ "w": 25,
29
+ "x": 26,
30
+ "y": 27,
31
+ "z": 28,
32
+ "|": 0,
33
+ "à": 29,
34
+ "á": 30,
35
+ "â": 31,
36
+ "ã": 32,
37
+ "ç": 33,
38
+ "é": 34,
39
+ "ê": 35,
40
+ "í": 36,
41
+ "ò": 37,
42
+ "ó": 38,
43
+ "ô": 39,
44
+ "õ": 40,
45
+ "ú": 41
46
+ }