RichardErkhov commited on
Commit
16b2cd2
•
1 Parent(s): c6ff115

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +141 -0
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ GEITje-7B - bnb 8bits
11
+ - Model creator: https://huggingface.co/Rijgersberg/
12
+ - Original model: https://huggingface.co/Rijgersberg/GEITje-7B/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: apache-2.0
20
+ base_model: mistralai/Mistral-7B-v0.1
21
+ tags:
22
+ - generated_from_trainer
23
+ - GEITje
24
+ datasets:
25
+ - Rijgersberg/GEITje-pretrain-10b
26
+ model-index:
27
+ - name: GEITje-v1-7B
28
+ results: []
29
+ language:
30
+ - nl
31
+ ---
32
+
33
+ # GEITje-7B
34
+
35
+ GEITje is a large open Dutch language model with 7 billion parameters, based on Mistral 7B.
36
+ It has been further trained on 10 billion tokens of Dutch text.
37
+ This has improved its Dutch language skills and increased its knowledge of Dutch topics.
38
+
39
+
40
+ ## Model description
41
+
42
+ ### _Mistral_ – Base Model
43
+ GEITje is based on [Mistral 7B](https://mistral.ai/news/announcing-mistral-7b/).
44
+ It's a large open language model with 7 billion parameters,
45
+ trained by [Mistral AI](https://mistral.ai).
46
+ According to Mistral AI, the 7B model performs better than [Llama 2](https://ai.meta.com/llama/) 13B on all (English-language) benchmarks they tested it on.
47
+ Mistral 7B has been released under the Apache 2.0 open source license.
48
+
49
+
50
+ ### _GEITje_ – Trained Further on Dutch Texts
51
+ GEITje was created by further training Mistral 7B on no less than 10 billion tokens of Dutch text from the [Dutch Gigacorpus](http://gigacorpus.nl) and the [MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400) web crawling corpus.
52
+ It is a so-called _full-parameter finetune_:
53
+ performed on all parameters.
54
+ It is not a [PEFT](https://huggingface.co/blog/peft) or [LoRA](https://huggingface.co/docs/peft/conceptual_guides/lora) finetune.
55
+ Like Mistral, GEITje has a _context length_ of 8,192 tokens.
56
+
57
+ ## More info
58
+ Read more about GEITje in the [📄 README](https://github.com/Rijgersberg/GEITje/blob/main/README-en.md) on GitHub.
59
+
60
+ ## Checkpoints
61
+ Intermediate checkpoints are available in the `checkpoints` branch.
62
+
63
+ ## Training procedure
64
+
65
+ ### Training hyperparameters
66
+
67
+ The following hyperparameters were used during training:
68
+ - learning_rate: 2e-05
69
+ - train_batch_size: 2
70
+ - eval_batch_size: 2
71
+ - seed: 42
72
+ - distributed_type: multi-GPU
73
+ - num_devices: 8
74
+ - gradient_accumulation_steps: 8
75
+ - total_train_batch_size: 128
76
+ - total_eval_batch_size: 16
77
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
78
+ - lr_scheduler_type: cosine
79
+ - lr_scheduler_warmup_steps: 953
80
+ - training_steps: 9536
81
+
82
+ ### Training results
83
+
84
+ | Training Loss | Epoch | Step | Validation Loss |
85
+ |:-------------:|:-----:|:----:|:---------------:|
86
+ | 1.6995 | 0.02 | 199 | 1.7673 |
87
+ | 1.6949 | 0.04 | 398 | 1.6880 |
88
+ | 1.6377 | 0.06 | 597 | 1.6429 |
89
+ | 1.6011 | 0.08 | 796 | 1.6384 |
90
+ | 1.5196 | 0.1 | 995 | 1.6060 |
91
+ | 1.5158 | 0.13 | 1194 | 1.5832 |
92
+ | 1.5181 | 0.15 | 1393 | 1.5541 |
93
+ | 1.4931 | 0.17 | 1592 | 1.5493 |
94
+ | 1.4972 | 0.19 | 1791 | 1.5407 |
95
+ | 1.5349 | 0.21 | 1990 | 1.5305 |
96
+ | 1.5025 | 0.23 | 2189 | 1.5263 |
97
+ | 1.396 | 0.25 | 2388 | 1.5140 |
98
+ | 1.4353 | 0.27 | 2587 | 1.5104 |
99
+ | 1.4307 | 0.29 | 2786 | 1.5003 |
100
+ | 1.3974 | 0.31 | 2985 | 1.4849 |
101
+ | 1.404 | 0.33 | 3184 | 1.4771 |
102
+ | 1.4299 | 0.35 | 3383 | 1.4825 |
103
+ | 1.4342 | 0.38 | 3582 | 1.4705 |
104
+ | 1.4341 | 0.4 | 3781 | 1.4643 |
105
+ | 1.4535 | 0.42 | 3980 | 1.4580 |
106
+ | 1.4799 | 0.44 | 4179 | 1.4521 |
107
+ | 1.35 | 0.46 | 4378 | 1.4478 |
108
+ | 1.4586 | 0.48 | 4577 | 1.4425 |
109
+ | 1.3685 | 0.5 | 4776 | 1.4368 |
110
+ | 1.4572 | 0.52 | 4975 | 1.4313 |
111
+ | 1.3293 | 0.54 | 5174 | 1.4265 |
112
+ | 1.403 | 0.56 | 5373 | 1.4241 |
113
+ | 1.3057 | 0.58 | 5572 | 1.4188 |
114
+ | 1.244 | 0.61 | 5771 | 1.4178 |
115
+ | 1.3224 | 0.63 | 5970 | 1.4110 |
116
+ | 1.3238 | 0.65 | 6169 | 1.4083 |
117
+ | 1.3262 | 0.67 | 6368 | 1.4050 |
118
+ | 1.3237 | 0.69 | 6567 | 1.4027 |
119
+ | 1.0453 | 0.71 | 6766 | 1.4005 |
120
+ | 1.3136 | 0.73 | 6965 | 1.3992 |
121
+ | 1.3137 | 0.75 | 7164 | 1.3975 |
122
+ | 1.1587 | 0.77 | 7363 | 1.3964 |
123
+ | 1.316 | 0.79 | 7562 | 1.3957 |
124
+ | 1.2738 | 0.81 | 7761 | 1.3951 |
125
+ | 1.308 | 0.83 | 7960 | 1.3949 |
126
+ | 1.4049 | 0.86 | 8159 | 1.3946 |
127
+ | 1.3324 | 0.88 | 8358 | 1.3944 |
128
+ | 1.3446 | 0.9 | 8557 | 1.3944 |
129
+ | 1.2489 | 0.92 | 8756 | 1.3943 |
130
+ | 1.2687 | 0.94 | 8955 | 1.3943 |
131
+ | 1.3293 | 0.96 | 9154 | 1.3943 |
132
+ | 1.3045 | 0.98 | 9353 | 1.3943 |
133
+
134
+
135
+ ### Framework versions
136
+
137
+ - Transformers 4.36.0.dev0
138
+ - Pytorch 2.1.1+cu121
139
+ - Datasets 2.15.0
140
+ - Tokenizers 0.15.0
141
+