File size: 5,666 Bytes
68f30f0
 
ee374ee
 
 
 
 
 
 
 
 
 
68f30f0
 
ee374ee
 
 
 
 
 
 
3acb317
 
92bbdc4
ee374ee
092c1b6
27b4e66
092c1b6
 
 
 
 
 
 
 
 
 
 
 
ee374ee
 
 
d34bbd5
ee374ee
 
 
d34bbd5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ee374ee
 
d34bbd5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ee374ee
 
d34bbd5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ee374ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
library_name: transformers
license: mit
language:
- fr
- en
tags:
- french
- chocolatine
datasets:
- jpacifico/french-orca-dpo-pairs-revised
pipeline_tag: text-generation
---

### Chocolatine-14B-Instruct-DPO-v1.2

DPO fine-tuned of [microsoft/Phi-3-medium-4k-instruct](https://huggingface.co/microsoft/Phi-3-medium-4k-instruct) (14B params)  
using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) rlhf dataset.  
Training in French also improves the model in English, surpassing the performances of its base model.  
Window context = 4k tokens  

* **4-bit quantized version** is available here : [jpacifico/Chocolatine-14B-Instruct-DPO-v1.2-Q4_K_M-GGUF](https://huggingface.co/jpacifico/Chocolatine-14B-Instruct-DPO-v1.2-Q4_K_M-GGUF)

### OpenLLM Leaderboard

Chocolatine is the best-performing 14B model on the [OpenLLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) (2024/09/01)   
and even the number one of the < 22B params models
![image/png](https://github.com/jpacifico/Chocolatine-LLM/blob/main/Assets/chocolatine_14B_leaderboard_20240901.png?raw=false)  


|      Metric       |Value|
|-------------------|----:|
|**Avg.**               |**33.3**|
|IFEval     |68.52|
|BBH        |49.85|
|MATH Lvl 5 |17.98|
|GPQA       |10.07|
|MuSR       |12.35|
|MMLU-PRO   |41.07|

### MT-Bench-French

Chocolatine-14B-Instruct-DPO-v1.2 outperforms its previous versions and its base model Phi-3-medium-4k-instruct on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) and GPT-4-Turbo as LLM-judge. 

```
########## First turn ##########
                                             score
model                                 turn        
gpt-4o-mini                           1     9.2875
Chocolatine-14B-Instruct-4k-DPO       1     8.6375
Chocolatine-14B-Instruct-DPO-v1.2     1     8.6125
Phi-3.5-mini-instruct                 1     8.5250
Chocolatine-3B-Instruct-DPO-v1.2      1     8.3750
Phi-3-medium-4k-instruct              1     8.2250
gpt-3.5-turbo                         1     8.1375
Chocolatine-3B-Instruct-DPO-Revised   1     7.9875
Daredevil-8B                          1     7.8875
Meta-Llama-3.1-8B-Instruct            1     7.0500
vigostral-7b-chat                     1     6.7875
Mistral-7B-Instruct-v0.3              1     6.7500
gemma-2-2b-it                         1     6.4500
French-Alpaca-7B-Instruct_beta        1     5.6875
vigogne-2-7b-chat                     1     5.6625

########## Second turn ##########
                                               score
model                                 turn          
gpt-4o-mini                           2     8.912500
Chocolatine-14B-Instruct-DPO-v1.2     2     8.337500
Chocolatine-3B-Instruct-DPO-Revised   2     7.937500
Chocolatine-3B-Instruct-DPO-v1.2      2     7.862500
Phi-3-medium-4k-instruct              2     7.750000
Chocolatine-14B-Instruct-4k-DPO       2     7.737500
gpt-3.5-turbo                         2     7.679167
Phi-3.5-mini-instruct                 2     7.575000
Daredevil-8B                          2     7.087500
Meta-Llama-3.1-8B-Instruct            2     6.787500
Mistral-7B-Instruct-v0.3              2     6.500000
vigostral-7b-chat                     2     6.162500
gemma-2-2b-it                         2     6.100000
French-Alpaca-7B-Instruct_beta        2     5.487395
vigogne-2-7b-chat                     2     2.775000

########## Average ##########
                                          score
model                                          
gpt-4o-mini                            9.100000
Chocolatine-14B-Instruct-DPO-v1.2      8.475000
Chocolatine-14B-Instruct-4k-DPO        8.187500
Chocolatine-3B-Instruct-DPO-v1.2       8.118750
Phi-3.5-mini-instruct                  8.050000
Phi-3-medium-4k-instruct               7.987500
Chocolatine-3B-Instruct-DPO-Revised    7.962500
gpt-3.5-turbo                          7.908333
Daredevil-8B                           7.487500
Meta-Llama-3.1-8B-Instruct             6.918750
Mistral-7B-Instruct-v0.3               6.625000
vigostral-7b-chat                      6.475000
gemma-2-2b-it                          6.275000
French-Alpaca-7B-Instruct_beta         5.587866
vigogne-2-7b-chat                      4.218750
```

### Usage

You can run this model using my [Colab notebook](https://github.com/jpacifico/Chocolatine-LLM/blob/main/Chocolatine_14B_inference_test_colab.ipynb) 

You can also run Chocolatine using the following code:

```python
import transformers
from transformers import AutoTokenizer

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])
```

### Limitations

The Chocolatine model is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.  
It does not have any moderation mechanism.  

- **Developed by:** Jonathan Pacifico, 2024
- **Model type:** LLM 
- **Language(s) (NLP):** French, English
- **License:** MIT