dittops commited on
Commit
3be3888
1 Parent(s): 85d673e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -85,20 +85,20 @@ print(tokenizer.decode(sample[0]))
85
 
86
  ## Training details
87
 
88
- The model is trained of 16 A100 80GB for approximately 50hrs.
89
 
90
  | Hyperparameters | Value |
91
  | :----------------------------| :-----: |
92
- | per_device_train_batch_size | 16 |
93
  | gradient_accumulation_steps | 1 |
94
  | epoch | 3 |
95
- | steps | 2157 |
96
  | learning_rate | 2e-5 |
97
  | lr schedular type | cosine |
98
  | warmup ratio | 0.1 |
99
  | optimizer | adamw |
100
  | fp16 | True |
101
- | GPU | 16 A100 80GB |
102
 
103
  ### Important Note
104
 
 
85
 
86
  ## Training details
87
 
88
+ The model is trained of 8 A100 80GB for approximately 50hrs.
89
 
90
  | Hyperparameters | Value |
91
  | :----------------------------| :-----: |
92
+ | per_device_train_batch_size | 8 |
93
  | gradient_accumulation_steps | 1 |
94
  | epoch | 3 |
95
+ | steps | 8628 |
96
  | learning_rate | 2e-5 |
97
  | lr schedular type | cosine |
98
  | warmup ratio | 0.1 |
99
  | optimizer | adamw |
100
  | fp16 | True |
101
+ | GPU | 8 A100 80GB |
102
 
103
  ### Important Note
104