File size: 3,613 Bytes

f83dd30
 
 
 
 
 
 
 
 
 
 
 
 
 
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
 
 
 
 
 
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
f83dd30
5b2f1c9
 
f83dd30
5b2f1c9
 
 
f83dd30
5b2f1c9

---
language:
- en
license: other
library_name: transformers
tags:
- orpo
- llama 3
- rlhf
- sft
datasets:
- mlabonne/orpo-dpo-mix-40k
---

# OrpoLlama-3-8B

![](https://i.imgur.com/ZHwzQvI.png)

This is an ORPO fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 1k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) created for [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3).

It's a successful fine-tune that follows the ChatML template!

**Try the demo**: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B

## 🔎 Application

This model uses a context window of 8k. It was trained with the ChatML template.

## 🏆 Evaluation

### Nous

OrpoLlama-4-8B outperforms Llama-3-8B-Instruct on the GPT4All and TruthfulQA datasets.

Evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), see the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).

| Model                                                                                                                                                                     |   Average |   AGIEval |   GPT4All | TruthfulQA |  Bigbench |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------: | --------: | --------: | ---------: | --------: |
| [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) [📄](https://gist.github.com/mlabonne/8329284d86035e6019edb11eb0933628) |     51.34 |     41.22 |     69.86 |      51.65 |     42.64 |
| [**mlabonne/OrpoLlama-3-8B**](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [📄](https://gist.github.com/mlabonne/22896a1ae164859931cc8f4858c97f6f)                     | **48.63** | **34.17** | **70.59** | **52.39** | **37.36** |
| [mlabonne/OrpoLlama-3-8B-1k](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [📄](https://gist.github.com/mlabonne/f41dad371d1781d0434a4672fd6f0b82)                      | 46.76     | 31.56     | 70.19     |  48.11     | 37.17     |
| [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) [📄](https://gist.github.com/mlabonne/616b6245137a9cfc4ea80e4c6e55d847)                   |     45.42 |      31.1 |     69.95 |      43.91 |      36.7 |

`mlabonne/OrpoLlama-3-8B-1k` corresponds to a version of this model trained on 1K samples (you can see the parameters in [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3)).

### Open LLM Leaderboard

TBD.

## 📈 Training curves

You can find the experiment on W&B at [this address](https://wandb.ai/mlabonne/DPO/runs/vxnmq24z/workspace?nw=nwusermlabonne).

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/zm71HyZiG96YY1GUtpfHq.png)

## 💻 Usage

```python
!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/OrpoLlama-3-8B"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```