Junrulu
/

Reproduced-tulu2-dpo-13b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Junrulu commited on Mar 12, 2024

Commit

a3cf69a

·

verified ·

1 Parent(s): b597787

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ For best results, format all inputs in this manner. **Make sure to include a new
 The following hyperparameters were used during DPO training:
 - learning_rate: 1e-6 * sqrt(Num of Nodes)
 - total_train_batch_size: 128 * Num of Nodes
-- optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-08
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.1
 - Weight Decay: 0.05

 The following hyperparameters were used during DPO training:
 - learning_rate: 1e-6 * sqrt(Num of Nodes)
 - total_train_batch_size: 128 * Num of Nodes
+- optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
 - lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.1
 - Weight Decay: 0.05