BramVanroy
/

fietje-2-chat

Text Generation

alignment-handbook

text-generation-inference

Model card Files Files and versions

BramVanroy commited on Apr 29, 2024

Commit

b2afc5b

·

verified ·

1 Parent(s): 54a8919

Update README.md

Files changed (1) hide show

README.md +12 -7

README.md CHANGED Viewed

@@ -40,21 +40,26 @@ This is the chat version of Fietje, a DPO-tuned (aligned) continuation on [the i
 A thorough description of the creation and evaluation of Fietje as well as usage examples are available in [this Github repository](https://github.com/BramVanroy/fietje).
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:

 A thorough description of the creation and evaluation of Fietje as well as usage examples are available in [this Github repository](https://github.com/BramVanroy/fietje).
 ## Intended uses & limitations
+The same limitations as [phi-2](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2), and LLMs in general, apply here. LLMs hallucinate, make mistakes, and should not be trusted. Use at your own risk!
 ## Training and evaluation data
+Fietje 2B instruct was finetuned from [the instruct model](https://huggingface.co/BramVanroy/fietje-2b-instruct) on the following datasets. Number of training samples per dataset given in brackets, totalling 18,653 samples.
+- [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) subset `dpo_hq`: a cleaned version of [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) (9186)
+- [BramVanroy/orca_dpo_pairs_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch_cleaned) subset `dpo_all`: a cleaned version of [BramVanroy/orca_dpo_pairs_dutch](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch) (9467)
+A lot of different learning rates, beta, en batch sizes were investigated in search of a converging combination. You can find them all in [the W&B runs](https://wandb.ai/bramvanroy/dpo-fietje-2b).
 ## Training procedure
+I am thankful to the [Flemish Supercomputer Center](https://www.vscentrum.be/) (VSC) for providing the computational power to accomplish this project. Accounting for waiting for jobs, training a single run took around nine hours on one A100 80GB.
+Training was done with the wonderful [alignment-handbook](https://github.com/huggingface/alignment-handbook), using DeepSpeed as a back-end. Exact training recipes and SLURM script are given in the [Github repository](https://github.com/BramVanroy/fietje).
 ### Training hyperparameters
 The following hyperparameters were used during training: