ku-nlp
/

gpt2-small-japanese-char

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

murawaki commited on Apr 21, 2023

Commit

126748d

·

1 Parent(s): 92192a3

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -63,8 +63,9 @@ The following hyperparameters were used during pre-training:
 - learning_rate: 2e-4
 - per_device_train_batch_size: 36
 - gradient_accumulation_steps: 32
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
 - weight_decay: 0.01
 - max_grad_norm: 1.0
 - max_steps: 500,000 (but terminated at *** steps)
 - warmup_steps: 10,000

 - learning_rate: 2e-4
 - per_device_train_batch_size: 36
 - gradient_accumulation_steps: 32
+- optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-06
 - weight_decay: 0.01
+- lr_scheduler_type: linear
 - max_grad_norm: 1.0
 - max_steps: 500,000 (but terminated at *** steps)
 - warmup_steps: 10,000