BramVanroy commited on
Commit
b2afc5b
·
verified ·
1 Parent(s): 54a8919

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -7
README.md CHANGED
@@ -40,21 +40,26 @@ This is the chat version of Fietje, a DPO-tuned (aligned) continuation on [the i
40
 
41
  A thorough description of the creation and evaluation of Fietje as well as usage examples are available in [this Github repository](https://github.com/BramVanroy/fietje).
42
 
43
-
44
- ## Model description
45
-
46
- More information needed
47
-
48
  ## Intended uses & limitations
49
 
50
- More information needed
51
 
52
  ## Training and evaluation data
53
 
54
- More information needed
 
 
 
 
 
55
 
56
  ## Training procedure
57
 
 
 
 
 
 
58
  ### Training hyperparameters
59
 
60
  The following hyperparameters were used during training:
 
40
 
41
  A thorough description of the creation and evaluation of Fietje as well as usage examples are available in [this Github repository](https://github.com/BramVanroy/fietje).
42
 
 
 
 
 
 
43
  ## Intended uses & limitations
44
 
45
+ The same limitations as [phi-2](https://huggingface.co/microsoft/phi-2#limitations-of-phi-2), and LLMs in general, apply here. LLMs hallucinate, make mistakes, and should not be trusted. Use at your own risk!
46
 
47
  ## Training and evaluation data
48
 
49
+ Fietje 2B instruct was finetuned from [the instruct model](https://huggingface.co/BramVanroy/fietje-2b-instruct) on the following datasets. Number of training samples per dataset given in brackets, totalling 18,653 samples.
50
+
51
+ - [BramVanroy/ultra_feedback_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch_cleaned) subset `dpo_hq`: a cleaned version of [BramVanroy/ultra_feedback_dutch](https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch) (9186)
52
+ - [BramVanroy/orca_dpo_pairs_dutch_cleaned](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch_cleaned) subset `dpo_all`: a cleaned version of [BramVanroy/orca_dpo_pairs_dutch](https://huggingface.co/datasets/BramVanroy/orca_dpo_pairs_dutch) (9467)
53
+
54
+ A lot of different learning rates, beta, en batch sizes were investigated in search of a converging combination. You can find them all in [the W&B runs](https://wandb.ai/bramvanroy/dpo-fietje-2b).
55
 
56
  ## Training procedure
57
 
58
+ I am thankful to the [Flemish Supercomputer Center](https://www.vscentrum.be/) (VSC) for providing the computational power to accomplish this project. Accounting for waiting for jobs, training a single run took around nine hours on one A100 80GB.
59
+
60
+ Training was done with the wonderful [alignment-handbook](https://github.com/huggingface/alignment-handbook), using DeepSpeed as a back-end. Exact training recipes and SLURM script are given in the [Github repository](https://github.com/BramVanroy/fietje).
61
+
62
+
63
  ### Training hyperparameters
64
 
65
  The following hyperparameters were used during training: