Update README.md
Browse files
README.md
CHANGED
@@ -40,21 +40,26 @@ This is the chat version of Fietje, a DPO-tuned (aligned) continuation on [the i
|
|
40 |
|
41 |
A thorough description of the creation and evaluation of Fietje as well as usage examples are available in [this Github repository](https://github.com/BramVanroy/fietje).
|
42 |
|
43 |
-
|
44 |
-
## Model description
|
45 |
-
|
46 |
-
More information needed
|
47 |
-
|
48 |
## Intended uses & limitations
|
49 |
|
50 |
-
|
51 |
|
52 |
## Training and evaluation data
|
53 |
|
54 |
-
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
## Training procedure
|
57 |
|
|
|
|
|
|
|
|
|
|
|
58 |
### Training hyperparameters
|
59 |
|
60 |
The following hyperparameters were used during training:
|
|
|
40 |
|
41 |
A thorough description of the creation and evaluation of Fietje as well as usage examples are available in [this Github repository](https://github.com/BramVanroy/fietje).
|
42 |
|
|
|
|
|
|
|
|
|
|
|
43 |
## Intended uses & limitations
|
44 |
|
45 |
+
The same limitations as [phi-2](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2), and LLMs in general, apply here. LLMs hallucinate, make mistakes, and should not be trusted. Use at your own risk!
|
46 |
|
47 |
## Training and evaluation data
|
48 |
|
49 |
+
Fietje 2B instruct was finetuned from [the instruct model](https://huggingface.co/BramVanroy/fietje-2b-instruct) on the following datasets. Number of training samples per dataset given in brackets, totalling 18,653 samples.
|
50 |
+
|
51 |
+
- [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) subset `dpo_hq`: a cleaned version of [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) (9186)
|
52 |
+
- [BramVanroy/orca_dpo_pairs_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch_cleaned) subset `dpo_all`: a cleaned version of [BramVanroy/orca_dpo_pairs_dutch](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch) (9467)
|
53 |
+
|
54 |
+
A lot of different learning rates, beta, en batch sizes were investigated in search of a converging combination. You can find them all in [the W&B runs](https://wandb.ai/bramvanroy/dpo-fietje-2b).
|
55 |
|
56 |
## Training procedure
|
57 |
|
58 |
+
I am thankful to the [Flemish Supercomputer Center](https://www.vscentrum.be/) (VSC) for providing the computational power to accomplish this project. Accounting for waiting for jobs, training a single run took around nine hours on one A100 80GB.
|
59 |
+
|
60 |
+
Training was done with the wonderful [alignment-handbook](https://github.com/huggingface/alignment-handbook), using DeepSpeed as a back-end. Exact training recipes and SLURM script are given in the [Github repository](https://github.com/BramVanroy/fietje).
|
61 |
+
|
62 |
+
|
63 |
### Training hyperparameters
|
64 |
|
65 |
The following hyperparameters were used during training:
|