PereLluis13 commited on
Commit
3b10db1
1 Parent(s): 89fc6f3

update model

Browse files
README.md CHANGED
@@ -77,20 +77,20 @@ model-index:
77
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
78
  should probably proofread and complete it, then remove this comment. -->
79
 
80
- # wav2vec2-xls-r-300m-ca-lm
81
 
82
- This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - CA dataset.
83
- It achieves the following results on the averaged across datasets test set (without the LM):
84
- - Loss: 0.2758
85
- - Wer: 0.1792
86
 
87
  ## Model description
88
 
89
- More information needed
90
 
91
  ## Intended uses & limitations
92
 
93
- More information needed
94
 
95
  ## Training and evaluation data
96
 
@@ -98,6 +98,8 @@ More information needed
98
 
99
  ## Training procedure
100
 
 
 
101
  ### Training hyperparameters
102
 
103
  The following hyperparameters were used during training:
@@ -110,10 +112,12 @@ The following hyperparameters were used during training:
110
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
111
  - lr_scheduler_type: linear
112
  - lr_scheduler_warmup_steps: 2000
113
- - num_epochs: 6.0
114
  - mixed_precision_training: Native AMP
115
 
116
- ### Training results (without LM)
 
 
117
 
118
  | Training Loss | Epoch | Step | Validation Loss | Wer |
119
  |:-------------:|:-----:|:-----:|:---------------:|:------:|
@@ -162,10 +166,32 @@ The following hyperparameters were used during training:
162
  | 1.0805 | 11.45 | 21500 | 0.2561 | 0.1524 |
163
  | 1.0722 | 11.72 | 22000 | 0.2540 | 0.1566 |
164
  | 1.0763 | 11.99 | 22500 | 0.2549 | 0.1572 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
 
166
  ### Framework versions
167
 
168
  - Transformers 4.16.0.dev0
169
  - Pytorch 1.10.1+cu102
170
- - Datasets 1.18.1
171
  - Tokenizers 0.11.0
 
 
 
 
77
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
78
  should probably proofread and complete it, then remove this comment. -->
79
 
80
+ # wav2vec2-xls-r-300m-ca
81
 
82
+ This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - CA, the [tv3_parla](https://huggingface.co/datasets/collectivat/tv3_parla) and [parlament_parla](https://huggingface.co/datasets/projecte-aina/parlament_parla) datasets.
83
+ It achieves the following results on the evaluation set (for the three datasets and without the LM):
84
+ - Loss: 0.2472
85
+ - Wer: 0.1499
86
 
87
  ## Model description
88
 
89
+ Please check the original [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) Model card. This is just a finetuned version of that model.
90
 
91
  ## Intended uses & limitations
92
 
93
+ As any model trained on crowdsourced data, this model can show the biases and particularities of the data and model used to train this model. Moreover, since this is a speech recognition model, it may underperform for some lower-resourced dialects for the catalan language.
94
 
95
  ## Training and evaluation data
96
 
98
 
99
  ## Training procedure
100
 
101
+ The data is preprocessed to remove characters not on the catalan alphabet. Moreover, numbers are verbalized using code provided by [@ccoreilly](https://github.com/ccoreilly), which can be found on the text/ folder or [here](https://github.com/CollectivaT-dev/catotron-cpu/blob/master/text/numbers_ca.py).
102
+
103
  ### Training hyperparameters
104
 
105
  The following hyperparameters were used during training:
112
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
113
  - lr_scheduler_type: linear
114
  - lr_scheduler_warmup_steps: 2000
115
+ - num_epochs: 18.0
116
  - mixed_precision_training: Native AMP
117
 
118
+ ### Training results
119
+
120
+ Check the Tensorboard tab to check the training profile and evaluation results along training. The model was evaluated on the test splits for each of the datasets used during training.
121
 
122
  | Training Loss | Epoch | Step | Validation Loss | Wer |
123
  |:-------------:|:-----:|:-----:|:---------------:|:------:|
166
  | 1.0805 | 11.45 | 21500 | 0.2561 | 0.1524 |
167
  | 1.0722 | 11.72 | 22000 | 0.2540 | 0.1566 |
168
  | 1.0763 | 11.99 | 22500 | 0.2549 | 0.1572 |
169
+ | 1.0835 | 12.25 | 23000 | 0.2586 | 0.1521 |
170
+ | 1.0883 | 12.52 | 23500 | 0.2583 | 0.1519 |
171
+ | 1.0888 | 12.79 | 24000 | 0.2551 | 0.1582 |
172
+ | 1.0933 | 13.05 | 24500 | 0.2628 | 0.1537 |
173
+ | 1.0799 | 13.32 | 25000 | 0.2600 | 0.1508 |
174
+ | 1.0804 | 13.59 | 25500 | 0.2620 | 0.1475 |
175
+ | 1.0814 | 13.85 | 26000 | 0.2537 | 0.1517 |
176
+ | 1.0693 | 14.12 | 26500 | 0.2560 | 0.1542 |
177
+ | 1.0724 | 14.38 | 27000 | 0.2540 | 0.1574 |
178
+ | 1.0704 | 14.65 | 27500 | 0.2548 | 0.1626 |
179
+ | 1.0729 | 14.92 | 28000 | 0.2548 | 0.1601 |
180
+ | 1.0724 | 15.18 | 28500 | 0.2511 | 0.1512 |
181
+ | 1.0655 | 15.45 | 29000 | 0.2498 | 0.1490 |
182
+ | 1.0608 | 15.98 | 30000 | 0.2487 | 0.1481 |
183
+ | 1.0541 | 16.52 | 31000 | 0.2468 | 0.1504 |
184
+ | 1.0584 | 17.05 | 32000 | 0.2467 | 0.1493 |
185
+ | 1.0507 | 17.58 | 33000 | 0.2481 | 0.1517 |
186
+
187
 
188
  ### Framework versions
189
 
190
  - Transformers 4.16.0.dev0
191
  - Pytorch 1.10.1+cu102
192
+ - Datasets 1.18.3
193
  - Tokenizers 0.11.0
194
+
195
+ # Thanks
196
+
197
+ Want to thank both [@ccoreilly](https://github.com/ccoreilly) and [@gullabi](https://github.com/gullabi) who have contributed with their own resources and knowledge into making this model possible.
eval_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 12.0,
3
- "eval_loss": 0.25491979718208313,
4
- "eval_runtime": 392.0567,
5
  "eval_samples": 4297,
6
- "eval_samples_per_second": 10.96,
7
- "eval_steps_per_second": 0.344,
8
- "eval_wer": 0.15725760362438562
9
  }
1
  {
2
+ "epoch": 18.0,
3
+ "eval_loss": 0.2472492903470993,
4
+ "eval_runtime": 373.4142,
5
  "eval_samples": 4297,
6
+ "eval_samples_per_second": 11.507,
7
+ "eval_steps_per_second": 0.362,
8
+ "eval_wer": 0.14990076581772083
9
  }
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:38f9952471847b9dbd693d34fa642974ebb6a016e7677a8ccfb3e3458f45e32a
3
  size 1262112241
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:382829868e73fa85ab4aea6f9cfa1e2258955546556cfbc4c1aa0ac435d86981
3
  size 1262112241
runs/Feb01_18-08-21_job-336a688f-553a-4e6e-83b3-ad5d10274b51/1643741534.116655/events.out.tfevents.1643741534.job-336a688f-553a-4e6e-83b3-ad5d10274b51.3348585.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24cb9f2a1f8cd9f07b463f6996a54a600e48f99b5d21f1cabc83dc60826e1698
3
+ size 4814
runs/Feb01_18-08-21_job-336a688f-553a-4e6e-83b3-ad5d10274b51/events.out.tfevents.1643741534.job-336a688f-553a-4e6e-83b3-ad5d10274b51.3348585.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b43e306271bfa60c6f6bf01ab0eae8c36a521e4e5cb0e8a55687eb99b5562c56
3
+ size 10554
runs/Feb04_14-58-29_job-336a688f-553a-4e6e-83b3-ad5d10274b51/1643989411.4467487/events.out.tfevents.1643989411.job-336a688f-553a-4e6e-83b3-ad5d10274b51.728502.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a74d331c074b656f428d24db8dce4efbe1b7f1dde7e244fac7207dc29ae942c3
3
+ size 4814
runs/Feb04_14-58-29_job-336a688f-553a-4e6e-83b3-ad5d10274b51/events.out.tfevents.1643989411.job-336a688f-553a-4e6e-83b3-ad5d10274b51.728502.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7dd620d37e0221c513ea85f339709d22725b2e11e30fd5c4c973ce761b4e5e24
3
+ size 7529
runs/Feb04_14-58-29_job-336a688f-553a-4e6e-83b3-ad5d10274b51/events.out.tfevents.1644061137.job-336a688f-553a-4e6e-83b3-ad5d10274b51.728502.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ba70b885ace8400ed4e1f00fd81e27187a8a5dabaf13d4b441e91a0dff4eb3c
3
+ size 364
special_tokens_map.json CHANGED
@@ -1 +1 @@
1
- {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "[UNK]", "pad_token": "[PAD]", "additional_special_tokens": [{"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}]}
1
+ {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "[UNK]", "pad_token": "[PAD]", "additional_special_tokens": [{"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}]}
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 12.0,
3
- "train_loss": 0.5676147035501541,
4
- "train_runtime": 172546.67,
5
  "train_samples": 240334,
6
- "train_samples_per_second": 16.714,
7
- "train_steps_per_second": 0.131
8
  }
1
  {
2
+ "epoch": 18.0,
3
+ "train_loss": 0.16521977071390054,
4
+ "train_runtime": 71350.5908,
5
  "train_samples": 240334,
6
+ "train_samples_per_second": 60.63,
7
+ "train_steps_per_second": 0.474
8
  }
trainer_state.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 11.999600585807482,
5
- "global_step": 22524,
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
@@ -683,18 +683,273 @@
683
  "step": 22500
684
  },
685
  {
686
- "epoch": 12.0,
687
- "step": 22524,
688
- "total_flos": 6.281601139352125e+20,
689
- "train_loss": 0.5676147035501541,
690
- "train_runtime": 172546.67,
691
- "train_samples_per_second": 16.714,
692
- "train_steps_per_second": 0.131
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
693
  }
694
  ],
695
- "max_steps": 22524,
696
- "num_train_epochs": 12,
697
- "total_flos": 6.281601139352125e+20,
698
  "trial_name": null,
699
  "trial_params": null
700
  }
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 17.99960058580748,
5
+ "global_step": 33786,
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
683
  "step": 22500
684
  },
685
  {
686
+ "epoch": 12.25,
687
+ "learning_rate": 2.5471119360724847e-05,
688
+ "loss": 1.0835,
689
+ "step": 23000
690
+ },
691
+ {
692
+ "epoch": 12.25,
693
+ "eval_loss": 0.25863561034202576,
694
+ "eval_runtime": 369.3533,
695
+ "eval_samples_per_second": 11.634,
696
+ "eval_steps_per_second": 0.366,
697
+ "eval_wer": 0.15212444278188222,
698
+ "step": 23000
699
+ },
700
+ {
701
+ "epoch": 12.52,
702
+ "learning_rate": 2.4293714213804817e-05,
703
+ "loss": 1.0883,
704
+ "step": 23500
705
+ },
706
+ {
707
+ "epoch": 12.52,
708
+ "eval_loss": 0.25827670097351074,
709
+ "eval_runtime": 370.2467,
710
+ "eval_samples_per_second": 11.606,
711
+ "eval_steps_per_second": 0.365,
712
+ "eval_wer": 0.15193740453256024,
713
+ "step": 23500
714
+ },
715
+ {
716
+ "epoch": 12.79,
717
+ "learning_rate": 2.3113949537532244e-05,
718
+ "loss": 1.0888,
719
+ "step": 24000
720
+ },
721
+ {
722
+ "epoch": 12.79,
723
+ "eval_loss": 0.2551300823688507,
724
+ "eval_runtime": 367.9843,
725
+ "eval_samples_per_second": 11.677,
726
+ "eval_steps_per_second": 0.367,
727
+ "eval_wer": 0.15819279487099555,
728
+ "step": 24000
729
+ },
730
+ {
731
+ "epoch": 13.05,
732
+ "learning_rate": 2.1934184861259672e-05,
733
+ "loss": 1.0933,
734
+ "step": 24500
735
+ },
736
+ {
737
+ "epoch": 13.05,
738
+ "eval_loss": 0.2628032863140106,
739
+ "eval_runtime": 369.9671,
740
+ "eval_samples_per_second": 11.615,
741
+ "eval_steps_per_second": 0.365,
742
+ "eval_wer": 0.1537142679011191,
743
+ "step": 24500
744
+ },
745
+ {
746
+ "epoch": 13.32,
747
+ "learning_rate": 2.07544201849871e-05,
748
+ "loss": 1.0799,
749
+ "step": 25000
750
+ },
751
+ {
752
+ "epoch": 13.32,
753
+ "eval_loss": 0.2600410580635071,
754
+ "eval_runtime": 374.9827,
755
+ "eval_samples_per_second": 11.459,
756
+ "eval_steps_per_second": 0.36,
757
+ "eval_wer": 0.150752828953521,
758
+ "step": 25000
759
+ },
760
+ {
761
+ "epoch": 13.59,
762
+ "learning_rate": 1.957701503806707e-05,
763
+ "loss": 1.0804,
764
+ "step": 25500
765
+ },
766
+ {
767
+ "epoch": 13.59,
768
+ "eval_loss": 0.26200664043426514,
769
+ "eval_runtime": 369.1646,
770
+ "eval_samples_per_second": 11.64,
771
+ "eval_steps_per_second": 0.366,
772
+ "eval_wer": 0.14753161465964235,
773
+ "step": 25500
774
+ },
775
+ {
776
+ "epoch": 13.85,
777
+ "learning_rate": 1.8397250361794498e-05,
778
+ "loss": 1.0814,
779
+ "step": 26000
780
+ },
781
+ {
782
+ "epoch": 13.85,
783
+ "eval_loss": 0.2537305951118469,
784
+ "eval_runtime": 368.6655,
785
+ "eval_samples_per_second": 11.656,
786
+ "eval_steps_per_second": 0.366,
787
+ "eval_wer": 0.15170880222783337,
788
+ "step": 26000
789
+ },
790
+ {
791
+ "epoch": 14.12,
792
+ "learning_rate": 1.7217485685521926e-05,
793
+ "loss": 1.0693,
794
+ "step": 26500
795
+ },
796
+ {
797
+ "epoch": 14.12,
798
+ "eval_loss": 0.25602129101753235,
799
+ "eval_runtime": 368.3159,
800
+ "eval_samples_per_second": 11.667,
801
+ "eval_steps_per_second": 0.367,
802
+ "eval_wer": 0.15421303656597773,
803
+ "step": 26500
804
+ },
805
+ {
806
+ "epoch": 14.38,
807
+ "learning_rate": 1.6037721009249354e-05,
808
+ "loss": 1.0724,
809
+ "step": 27000
810
+ },
811
+ {
812
+ "epoch": 14.38,
813
+ "eval_loss": 0.2540068030357361,
814
+ "eval_runtime": 369.0094,
815
+ "eval_samples_per_second": 11.645,
816
+ "eval_steps_per_second": 0.366,
817
+ "eval_wer": 0.15736151376289784,
818
+ "step": 27000
819
+ },
820
+ {
821
+ "epoch": 14.65,
822
+ "learning_rate": 1.4857956332976782e-05,
823
+ "loss": 1.0704,
824
+ "step": 27500
825
+ },
826
+ {
827
+ "epoch": 14.65,
828
+ "eval_loss": 0.25483617186546326,
829
+ "eval_runtime": 365.0658,
830
+ "eval_samples_per_second": 11.77,
831
+ "eval_steps_per_second": 0.37,
832
+ "eval_wer": 0.16258819373006225,
833
+ "step": 27500
834
+ },
835
+ {
836
+ "epoch": 14.92,
837
+ "learning_rate": 1.3678191656704208e-05,
838
+ "loss": 1.0729,
839
+ "step": 28000
840
+ },
841
+ {
842
+ "epoch": 14.92,
843
+ "eval_loss": 0.254844069480896,
844
+ "eval_runtime": 367.5842,
845
+ "eval_samples_per_second": 11.69,
846
+ "eval_steps_per_second": 0.367,
847
+ "eval_wer": 0.16009435040576908,
848
+ "step": 28000
849
+ },
850
+ {
851
+ "epoch": 15.18,
852
+ "learning_rate": 1.2498426980431636e-05,
853
+ "loss": 1.0724,
854
+ "step": 28500
855
+ },
856
+ {
857
+ "epoch": 15.18,
858
+ "eval_loss": 0.25110504031181335,
859
+ "eval_runtime": 367.3861,
860
+ "eval_samples_per_second": 11.696,
861
+ "eval_steps_per_second": 0.367,
862
+ "eval_wer": 0.15124120660452842,
863
+ "step": 28500
864
+ },
865
+ {
866
+ "epoch": 15.45,
867
+ "learning_rate": 1.1318662304159062e-05,
868
+ "loss": 1.0655,
869
+ "step": 29000
870
+ },
871
+ {
872
+ "epoch": 15.45,
873
+ "eval_loss": 0.24978148937225342,
874
+ "eval_runtime": 375.4183,
875
+ "eval_samples_per_second": 11.446,
876
+ "eval_steps_per_second": 0.36,
877
+ "eval_wer": 0.14903831166806944,
878
+ "step": 29000
879
+ },
880
+ {
881
+ "epoch": 15.98,
882
+ "learning_rate": 8.963852010319007e-06,
883
+ "loss": 1.0608,
884
+ "step": 30000
885
+ },
886
+ {
887
+ "epoch": 15.98,
888
+ "eval_loss": 0.24873663485050201,
889
+ "eval_runtime": 370.6074,
890
+ "eval_samples_per_second": 11.594,
891
+ "eval_steps_per_second": 0.364,
892
+ "eval_wer": 0.14812390244916196,
893
+ "step": 30000
894
+ },
895
+ {
896
+ "epoch": 16.52,
897
+ "learning_rate": 6.604322657773862e-06,
898
+ "loss": 1.0541,
899
+ "step": 31000
900
+ },
901
+ {
902
+ "epoch": 16.52,
903
+ "eval_loss": 0.2467627078294754,
904
+ "eval_runtime": 371.5001,
905
+ "eval_samples_per_second": 11.567,
906
+ "eval_steps_per_second": 0.363,
907
+ "eval_wer": 0.15039953448257948,
908
+ "step": 31000
909
+ },
910
+ {
911
+ "epoch": 17.05,
912
+ "learning_rate": 4.244793305228717e-06,
913
+ "loss": 1.0584,
914
+ "step": 32000
915
+ },
916
+ {
917
+ "epoch": 17.05,
918
+ "eval_loss": 0.2466605007648468,
919
+ "eval_runtime": 370.8863,
920
+ "eval_samples_per_second": 11.586,
921
+ "eval_steps_per_second": 0.364,
922
+ "eval_wer": 0.1493084780282012,
923
+ "step": 32000
924
+ },
925
+ {
926
+ "epoch": 17.58,
927
+ "learning_rate": 1.8852639526835713e-06,
928
+ "loss": 1.0507,
929
+ "step": 33000
930
+ },
931
+ {
932
+ "epoch": 17.58,
933
+ "eval_loss": 0.2480592578649521,
934
+ "eval_runtime": 373.0281,
935
+ "eval_samples_per_second": 11.519,
936
+ "eval_steps_per_second": 0.362,
937
+ "eval_wer": 0.15173997526938704,
938
+ "step": 33000
939
+ },
940
+ {
941
+ "epoch": 18.0,
942
+ "step": 33786,
943
+ "total_flos": 9.499341430600616e+20,
944
+ "train_loss": 0.16521977071390054,
945
+ "train_runtime": 71350.5908,
946
+ "train_samples_per_second": 60.63,
947
+ "train_steps_per_second": 0.474
948
  }
949
  ],
950
+ "max_steps": 33786,
951
+ "num_train_epochs": 18,
952
+ "total_flos": 9.499341430600616e+20,
953
  "trial_name": null,
954
  "trial_params": null
955
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d5c09474639eff781a9fbbd58d81fc04a95748d863d0d33f663b0592e8c64a21
3
  size 3055
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb4880e33458fbd00defffaa2d4a3e6ec806898ed51633382a67c03e0088e649
3
  size 3055