--- library_name: transformers tags: - llama 3 - 'orca ' - 'dpo ' datasets: - Intel/orca_dpo_pairs pipeline_tag: text-generation license: other license_name: llama-3 license_link: https://llama.meta.com/llama3/license --- # Orca-Llama-3-8B-Instruct-DPO Finetuned [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on [Intel/orca_dpo_pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) using a single 3090 24GB. Data formated using the ChatML template. GGUF can be found here [RDson/Orca-Llama-3-8B-Instruct-DPO-GGUF](https://huggingface.co/RDson/Orca-Llama-3-8B-Instruct-DPO-GGUF) ORPOConfig: ``` learning_rate=1e-6, lr_scheduler_type="linear", max_length=1024, max_prompt_length=512, overwrite_output_dir=True, beta=0.1, per_device_train_batch_size=2, per_device_eval_batch_size=2, gradient_accumulation_steps=4, optim="paged_adamw_8bit", num_train_epochs=1, evaluation_strategy="steps", eval_steps=0.2, logging_steps=1, warmup_steps=35, report_to="wandb", output_dir="./results/", fp16=True, save_steps=50 ```