File size: 5,455 Bytes
394b5d5 0c8f0de 394b5d5 e563b5a 77012a9 e563b5a 0c8f0de e563b5a 0c8f0de 394b5d5 146a18e 394b5d5 77012a9 146a18e 77012a9 146a18e 77012a9 f5da111 146a18e f5da111 364e785 77012a9 394b5d5 146a18e ae75069 394b5d5 b2afc5b 394b5d5 146a18e b2afc5b 146a18e 394b5d5 b2afc5b 394b5d5 08b3ae1 394b5d5 0c8f0de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
language:
- nl
license: mit
tags:
- trl
- fietje
- alignment-handbook
- dpo
base_model: BramVanroy/fietje-2-instruct
datasets:
- BramVanroy/ultra_feedback_dutch_cleaned
- BramVanroy/orca_dpo_pairs_dutch_cleaned
pipeline_tag: text-generation
inference: false
model-index:
- name: fietje-2-chat
results: []
---
<p align="center" style="margin:0;padding:0">
<img src="https://huggingface.co/BramVanroy/fietje-2-chat/resolve/main/img/fietje-2b-banner-rounded.png" alt="Fietje banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
</p>
<div style="margin:auto; text-align:center">
<h1 style="margin-bottom: 0">Fietje 2 Chat</h1>
<em>An open and efficient LLM for Dutch</em>
</div>
<blockquote class="tip" style="padding: 1.5em; border: 0">
<p align="center" style="text-align: center; margin: 0">
<a href="https://huggingface.co/BramVanroy/fietje-2">π±ββοΈ Base version</a> -
<a href="https://huggingface.co/BramVanroy/fietje-2-instruct">π€ Instruct version</a> -
<a href="https://huggingface.co/BramVanroy/fietje-2-chat">π¬ Chat version</a> (this one) -
<a href="https://huggingface.co/BramVanroy/fietje-2-chat-GGUF">π GGUF of Chat</a>
</p>
<p align="center" style="text-align: center; margin: 0">
<a href="https://huggingface.co/spaces/BramVanroy/fietje-2b"><strong>Chat with Fietje here!</strong></a>
</p>
</blockquote>
This is the chat version of Fietje, a DPO-tuned (aligned) continuation on [the instruct version](https://huggingface.co/BramVanroy/fietje-2-instruct). Fietje is an adapated version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2), tailored to Dutch text generation by training on 28B tokens. It is small and efficient with a size of 2.7 billion parameters while performing almost on par with more powerful Dutch LLMs of twice its size like [GEITje 7B Ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra).
A thorough description of the creation and evaluation of Fietje as well as usage examples are available in [this Github repository](https://github.com/BramVanroy/fietje).
## Intended uses & limitations
The same limitations as [phi-2](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2), and LLMs in general, apply here. LLMs hallucinate, make mistakes, and should not be trusted. Use at your own risk!
## Training and evaluation data
Fietje 2 Chat was finetuned from [the instruct model](https://huggingface.co/BramVanroy/fietje-2-instruct) on the following datasets. Number of training samples per dataset given in brackets, totalling 18,653 samples.
- [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) subset `dpo_hq`: a cleaned version of [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) (9186)
- [BramVanroy/orca_dpo_pairs_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch_cleaned) subset `dpo_all`: a cleaned version of [BramVanroy/orca_dpo_pairs_dutch](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch) (9467)
A lot of different learning rates, beta, en batch sizes were investigated in search of a converging combination. You can find them all in [the W&B runs](https://wandb.ai/bramvanroy/dpo-fietje-2).
## Training procedure
I am thankful to the [Flemish Supercomputer Center](https://www.vscentrum.be/) (VSC) for providing the computational power to accomplish this project. Accounting for waiting for jobs, training a single run took around nine hours on one A100 80GB.
Training was done with the wonderful [alignment-handbook](https://github.com/huggingface/alignment-handbook), using DeepSpeed as a back-end. Exact training recipes and SLURM script are given in the [Github repository](https://github.com/BramVanroy/fietje).
### Training hyperparameters
The following hyperparameters were used during training:
- beta: 0.2
- learning_rate: 2e-06
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.2515 | 1.0 | 1166 | 0.2842 | -1.1549 | -3.6363 | 0.8867 | 2.4815 | -657.6813 | -451.3364 | -1.2868 | -1.3528 |
### Framework versions
- Transformers 4.39.1
- Pytorch 2.1.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BramVanroy__fietje-2-chat)
| Metric |Value|
|-------------------|----:|
|Avg. |10.39|
|IFEval (0-Shot) |29.17|
|BBH (3-Shot) |17.72|
|MATH Lvl 5 (4-Shot)| 0.53|
|GPQA (0-shot) | 0.00|
|MuSR (0-shot) | 3.20|
|MMLU-PRO (5-shot) |11.72|
|