Edit model card

Orca-Llama-3-8B-Instruct-DPO

Finetuned Llama 3 8B Instruct on Intel/orca_dpo_pairs using a single 3090 24GB. Data formated using the ChatML template.

GGUF can be found here RDson/Orca-Llama-3-8B-Instruct-DPO-GGUF

ORPOConfig:

    learning_rate=1e-6,
    lr_scheduler_type="linear",
    max_length=1024,
    max_prompt_length=512,
    overwrite_output_dir=True,
    beta=0.1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    num_train_epochs=1,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=35,
    report_to="wandb",
    output_dir="./results/",
    fp16=True,
    save_steps=50
Downloads last month
62
Safetensors
Model size
8.03B params
Tensor type
FP16
·

Dataset used to train RDson/Orca-Llama-3-8B-Instruct-DPO