Edit model card

dfurman/Llama-3-70B-Orpo-v0.1

This is an ORPO fine-tune of meta-llama/Meta-Llama-3-70B on 2k samples of mlabonne/orpo-dpo-mix-40k.

It's a successful fine-tune that follows the ChatML template!

πŸ”Ž Application

This model uses a context window of 8k. It was trained with the ChatML template.

πŸ† Evaluation

Open LLM Leaderboard

Model ID Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
meta-llama/Meta-Llama-3-70B-Instruct πŸ“„ 77.88 71.42 85.69 80.06 61.81 82.87 85.44
dfurman/Llama-3-70B-Orpo-v0.1 πŸ“„ 74.67 68.69 88.01 79.39 49.62 85.48 76.8
meta-llama/Meta-Llama-3-70B πŸ“„ 73.96 68.77 87.98 79.23 45.56 85.32 76.88

πŸ“ˆ Training curves

You can find the experiment on W&B at this address.

πŸ’» Usage

Setup
!pip install -qU transformers accelerate bitsandbytes

from transformers import AutoTokenizer, BitsAndBytesConfig
import transformers
import torch

if torch.cuda.get_device_capability()[0] >= 8:
    !pip install -qqq flash-attn
    attn_implementation = "flash_attention_2"
    torch_dtype = torch.bfloat16
else:
    attn_implementation = "eager"
    torch_dtype = torch.float16

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch_dtype,
    bnb_4bit_use_double_quant=True,
)

model = "dfurman/Llama-3-70B-Orpo-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={
        "torch_dtype": torch_dtype,
        "quantization_config": bnb_config,
        "device_map": "auto",
        "attn_implementation": attn_implementation,
    }
)

Run

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me a recipe for a spicy margarita."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print("***Prompt:\n", prompt)

outputs = pipeline(prompt, max_new_tokens=1000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print("***Generation:\n", outputs[0]["generated_text"][len(prompt):])
Output
"""
"""
Downloads last month
242
Safetensors
Model size
70.6B params
Tensor type
FP16
Β·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train dfurman/Llama-3-70B-Orpo-v0.1

Collection including dfurman/Llama-3-70B-Orpo-v0.1