Orca-Llama-3-8B-Instruct-DPO

Finetuned Llama 3 8B Instruct on Intel/orca_dpo_pairs using a single 3090 24GB. Data formated using the ChatML template.

GGUF can be found here RDson/Orca-Llama-3-8B-Instruct-DPO-GGUF

ORPOConfig:

    learning_rate=1e-6,
    lr_scheduler_type="linear",
    max_length=1024,
    max_prompt_length=512,
    overwrite_output_dir=True,
    beta=0.1,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    num_train_epochs=1,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=35,
    report_to="wandb",
    output_dir="./results/",
    fp16=True,
    save_steps=50
Downloads last month
17
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for RDson/Orca-Llama-3-8B-Instruct-DPO

Merges
1 model

Dataset used to train RDson/Orca-Llama-3-8B-Instruct-DPO