Orca-Llama-3-8B-Instruct-DPO

Finetuned Llama 3 8B Instruct on Intel/orca_dpo_pairs using a single 3090 24GB. Data formated using the ChatML template.

GGUF can be found here RDson/Orca-Llama-3-8B-Instruct-DPO-GGUF

ORPOConfig:

    learning_rate=1e-6,
    lr_scheduler_type="linear",
    max_length=1024,
    max_prompt_length=512,
    overwrite_output_dir=True,
    beta=0.1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    num_train_epochs=1,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=35,
    report_to="wandb",
    output_dir="./results/",
    fp16=True,
    save_steps=50

RDson
/

Orca-Llama-3-8B-Instruct-DPO

Orca-Llama-3-8B-Instruct-DPO

Model tree for RDson/Orca-Llama-3-8B-Instruct-DPO

Dataset used to train RDson/Orca-Llama-3-8B-Instruct-DPO