Update README.md
Browse files
README.md
CHANGED
@@ -87,10 +87,37 @@ Use the code below to get started with the model.
|
|
87 |
|
88 |
[More Information Needed]
|
89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
|
91 |
-
#### Training Hyperparameters
|
92 |
|
93 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
94 |
|
95 |
#### Speeds, Sizes, Times [optional]
|
96 |
|
|
|
87 |
|
88 |
[More Information Needed]
|
89 |
|
90 |
+
### Training hyperparameters
|
91 |
+
|
92 |
+
The following hyperparameters were used during training:
|
93 |
+
- learning_rate: 5e-4
|
94 |
+
- per_device_train_batch_size=4
|
95 |
+
- eval_batch_size: 2
|
96 |
+
- evaluation_strategy="steps"
|
97 |
+
- gradient_checkpointing=True
|
98 |
+
- gradient_accumulation_steps: 4
|
99 |
+
- total_train_batch_size: 16
|
100 |
+
- num_train_epochs=3
|
101 |
+
- save_total_limit=1
|
102 |
+
- fp16=True
|
103 |
+
- save_steps=400
|
104 |
+
- eval_steps=200
|
105 |
+
- logging_steps=200
|
106 |
+
- push_to_hub=True
|
107 |
+
|
108 |
+
### Training results
|
109 |
+
|
110 |
+
| Training Loss | WER | Step | Validation Loss |
|
111 |
+
|:-------------:|:-----:|:----:|:---------------:|
|
112 |
+
| 6.427 | 0.33 | 200 | 0.5634 |
|
113 |
+
| 0.5994 | 0.67 | 400 | 0.5290 |
|
114 |
+
| 0.584 | 1.0 | 600 | 0.4924 |
|
115 |
+
| 0.5589 | 1.33 | 800 | 0.4828 |
|
116 |
+
| 0.5747 | 1.67 | 1000 | 0.4848 |
|
117 |
+
| 0.5904 | 2.0 | 1200 | 0.4831 |
|
118 |
+
|
|
119 |
|
|
|
120 |
|
|
|
121 |
|
122 |
#### Speeds, Sizes, Times [optional]
|
123 |
|