File size: 4,958 Bytes
394b5d5
 
146a18e
394b5d5
e563b5a
77012a9
 
e563b5a
 
 
 
394b5d5
146a18e
394b5d5
77012a9
 
 
 
394b5d5
 
77012a9
146a18e
77012a9
 
 
146a18e
77012a9
 
 
f5da111
 
146a18e
 
 
 
f5da111
 
364e785
77012a9
 
394b5d5
146a18e
ae75069
 
394b5d5
 
 
b2afc5b
394b5d5
 
 
146a18e
b2afc5b
 
 
 
146a18e
394b5d5
 
 
b2afc5b
 
 
 
 
394b5d5
 
 
08b3ae1
394b5d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77012a9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
license: mit
base_model: BramVanroy/fietje-2-instruct
tags:
- trl
- fietje
- alignment-handbook
- dpo
datasets:
- BramVanroy/ultra_feedback_dutch_cleaned
- BramVanroy/orca_dpo_pairs_dutch_cleaned
model-index:
- name: fietje-2-chat
  results: []
pipeline_tag: text-generation
inference: false
language:
- nl
---

<p align="center" style="margin:0;padding:0">
  <img src="https://huggingface.co/BramVanroy/fietje-2-chat/resolve/main/img/fietje-2b-banner-rounded.png" alt="Fietje banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
</p>

<div style="margin:auto; text-align:center">
  <h1 style="margin-bottom: 0">Fietje 2 Chat</h1>
  <em>An open and efficient LLM for Dutch</em>
</div>

<blockquote class="tip" style="padding: 1.5em; border: 0">
  <p align="center" style="text-align: center; margin: 0">
    <a href="https://huggingface.co/BramVanroy/fietje-2">👱‍♀️ Base version</a> -
    <a href="https://huggingface.co/BramVanroy/fietje-2-instruct">🤖 Instruct version</a> -
    <a href="https://huggingface.co/BramVanroy/fietje-2-chat">💬 Chat version</a> (this one) -
    <a href="https://huggingface.co/BramVanroy/fietje-2-chat-GGUF">🚀 GGUF of Chat</a>
  </p>
  <p align="center" style="text-align: center; margin: 0">
    <a href="https://huggingface.co/spaces/BramVanroy/fietje-2b"><strong>Chat with Fietje here!</strong></a>
  </p>
</blockquote>

This is the chat version of Fietje, a DPO-tuned (aligned) continuation on [the instruct version](https://huggingface.co/BramVanroy/fietje-2-instruct). Fietje is an adapated version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2), tailored to Dutch text generation by training on 28B tokens. It is small and efficient with a size of 2.7 billion parameters while performing almost on par with more powerful Dutch LLMs of twice its size like [GEITje 7B Ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra).

A thorough description of the creation and evaluation of Fietje as well as usage examples are available in [this Github repository](https://github.com/BramVanroy/fietje).

## Intended uses & limitations

The same limitations as [phi-2](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2), and LLMs in general, apply here. LLMs hallucinate, make mistakes, and should not be trusted. Use at your own risk!

## Training and evaluation data

Fietje 2 Chat was finetuned from [the instruct model](https://huggingface.co/BramVanroy/fietje-2-instruct) on the following datasets. Number of training samples per dataset given in brackets, totalling 18,653 samples.

- [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) subset `dpo_hq`: a cleaned version of [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) (9186)
- [BramVanroy/orca_dpo_pairs_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch_cleaned) subset `dpo_all`: a cleaned version of [BramVanroy/orca_dpo_pairs_dutch](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch) (9467)

A lot of different learning rates, beta, en batch sizes were investigated in search of a converging combination. You can find them all in [the W&B runs](https://wandb.ai/bramvanroy/dpo-fietje-2).

## Training procedure

I am thankful to the [Flemish Supercomputer Center](https://www.vscentrum.be/) (VSC) for providing the computational power to accomplish this project. Accounting for waiting for jobs, training a single run took around nine hours on one A100 80GB.

Training was done with the wonderful [alignment-handbook](https://github.com/huggingface/alignment-handbook), using DeepSpeed as a back-end. Exact training recipes and SLURM script are given in the [Github repository](https://github.com/BramVanroy/fietje).


### Training hyperparameters

The following hyperparameters were used during training:
- beta: 0.2
- learning_rate: 2e-06
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.2515        | 1.0   | 1166 | 0.2842          | -1.1549        | -3.6363          | 0.8867             | 2.4815          | -657.6813      | -451.3364    | -1.2868         | -1.3528       |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2