davidhornshaw
/

Qwen2.5-3B-ORPO

davidhornshaw commited on 25 days ago

Commit

953bad0

•

1 Parent(s): ca3d4dd

Added more detail to training hyperparams

Files changed (1) hide show

README.md CHANGED Viewed

@@ -69,11 +69,20 @@ It is a dataset is designed for ORPO or DPO training. See Fine-tune Llama 3 with
 ### Training Procedure
-We used the trl [ORPO trainer](https://huggingface.co/docs/trl/main/en/orpo_trainer) for finetuning, together with [LoRa](https://arxiv.org/abs/2106.09685) for speed-up.
 ### Training Hyperparameters
 - **Training regime:** fp16 non-mixed precision
 # Evaluation

 ### Training Procedure
+We used the trl [ORPO trainer](https://huggingface.co/docs/trl/main/en/orpo_trainer) for finetuning over four epochs with batch size two.
+Moreover, we used [LoRa](https://arxiv.org/abs/2106.09685) for parameter efficient training by targeting only particular parts of the base model architecture.
 ### Training Hyperparameters
 - **Training regime:** fp16 non-mixed precision
+- **Max lenght:** 4096
+- **Max prompt length:** 4096
+- **Batch size:** 2
+- **Epochs trained:** 4
+- **Modules targeted:** q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+- **Bias:** None
+All remaining hyperparameters were kept standard.
 # Evaluation