File size: 4,387 Bytes
180ab05
0ed5882
 
 
180ab05
5a150fc
 
 
8131e42
 
78a55cf
5a150fc
 
180ab05
 
5a150fc
180ab05
5a150fc
180ab05
7f200e4
e53699e
33eb34c
 
e53699e
 
8131e42
 
 
 
3534d05
 
 
 
 
 
 
 
 
 
 
35cb098
3534d05
5a150fc
180ab05
5a150fc
180ab05
33eb34c
 
5a150fc
180ab05
5a150fc
 
 
8131e42
 
5a150fc
180ab05
7f200e4
8131e42
 
 
38f682b
8131e42
5a150fc
180ab05
8131e42
 
 
180ab05
5a150fc
180ab05
5a150fc
 
180ab05
5a150fc
 
 
180ab05
5a150fc
 
180ab05
5a150fc
 
 
 
 
 
 
 
180ab05
5a150fc
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
language:
- en
license: other
library_name: transformers
tags:
- orpo
- llama 3
- rlhf
- sft
base_model: meta-llama/Meta-Llama-3-8B
datasets:
- mlabonne/orpo-dpo-mix-40k
---

# OrpoLlama-3-8B

![](https://i.imgur.com/ZHwzQvI.png)

This is an ORPO fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) created for [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3).

It's a successful fine-tune that follows the ChatML template!

**Try the demo**: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B

## πŸ”Ž Application

This model uses a context window of 8k. It was trained with the ChatML template.

## ⚑ Quantized models

Thanks to bartowski, solidrust, and LoneStriker for the quantized models.

* **GGUF**: https://huggingface.co/bartowski/OrpoLlama-3-8B-GGUF
* **AWQ**: https://huggingface.co/solidrust/OrpoLlama-3-8B-AWQ
* **EXL2**:
  * https://huggingface.co/LoneStriker/OrpoLlama-3-8B-3.0bpw-h6-exl2
  * https://huggingface.co/LoneStriker/OrpoLlama-3-8B-4.0bpw-h6-exl2
  * https://huggingface.co/LoneStriker/OrpoLlama-3-8B-5.0bpw-h6-exl2
  * https://huggingface.co/LoneStriker/OrpoLlama-3-8B-6.0bpw-h6-exl2
  * https://huggingface.co/LoneStriker/OrpoLlama-3-8B-8.0bpw-h8-exl2

## πŸ† Evaluation

### Nous

OrpoLlama-4-8B outperforms Llama-3-8B-Instruct on the GPT4All and TruthfulQA datasets.

Evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), see the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).

| Model                                                                                                                                                                     |   Average |   AGIEval |   GPT4All | TruthfulQA |  Bigbench |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------: | --------: | --------: | ---------: | --------: |
| [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) [πŸ“„](https://gist.github.com/mlabonne/8329284d86035e6019edb11eb0933628) |     51.34 |     41.22 |     69.86 |      51.65 |     42.64 |
| [**mlabonne/OrpoLlama-3-8B**](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [πŸ“„](https://gist.github.com/mlabonne/22896a1ae164859931cc8f4858c97f6f)                     | **48.63** | **34.17** | **70.59** | **52.39** | **37.36** |
| [mlabonne/OrpoLlama-3-8B-1k](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [πŸ“„](https://gist.github.com/mlabonne/f41dad371d1781d0434a4672fd6f0b82)                      | 46.76     | 31.56     | 70.19     |  48.11     | 37.17     |
| [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) [πŸ“„](https://gist.github.com/mlabonne/616b6245137a9cfc4ea80e4c6e55d847)                   |     45.42 |      31.1 |     69.95 |      43.91 |      36.7 |

`mlabonne/OrpoLlama-3-8B-1k` corresponds to a version of this model trained on 1K samples (you can see the parameters in [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3)). The current version was trained on a full epoch.

### Open LLM Leaderboard

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/5auE1oyKJ-TsvrUVeQBYr.png)

## πŸ“ˆ Training curves

You can find the experiment on W&B at [this address](https://wandb.ai/mlabonne/DPO/runs/vxnmq24z/workspace?nw=nwusermlabonne).

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/zm71HyZiG96YY1GUtpfHq.png)

## πŸ’» Usage

```python
!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/OrpoLlama-3-8B"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```