RichardErkhov commited on
Commit
463a456
•
1 Parent(s): 7d7a885

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +167 -0
README.md ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ GEITje-7B - GGUF
11
+ - Model creator: https://huggingface.co/Rijgersberg/
12
+ - Original model: https://huggingface.co/Rijgersberg/GEITje-7B/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [GEITje-7B.Q2_K.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q2_K.gguf) | Q2_K | 2.53GB |
18
+ | [GEITje-7B.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.IQ3_XS.gguf) | IQ3_XS | 2.81GB |
19
+ | [GEITje-7B.IQ3_S.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.IQ3_S.gguf) | IQ3_S | 2.96GB |
20
+ | [GEITje-7B.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q3_K_S.gguf) | Q3_K_S | 2.95GB |
21
+ | [GEITje-7B.IQ3_M.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.IQ3_M.gguf) | IQ3_M | 3.06GB |
22
+ | [GEITje-7B.Q3_K.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q3_K.gguf) | Q3_K | 3.28GB |
23
+ | [GEITje-7B.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q3_K_M.gguf) | Q3_K_M | 3.28GB |
24
+ | [GEITje-7B.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q3_K_L.gguf) | Q3_K_L | 3.56GB |
25
+ | [GEITje-7B.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.IQ4_XS.gguf) | IQ4_XS | 3.67GB |
26
+ | [GEITje-7B.Q4_0.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q4_0.gguf) | Q4_0 | 3.83GB |
27
+ | [GEITje-7B.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.IQ4_NL.gguf) | IQ4_NL | 3.87GB |
28
+ | [GEITje-7B.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q4_K_S.gguf) | Q4_K_S | 3.86GB |
29
+ | [GEITje-7B.Q4_K.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q4_K.gguf) | Q4_K | 4.07GB |
30
+ | [GEITje-7B.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q4_K_M.gguf) | Q4_K_M | 4.07GB |
31
+ | [GEITje-7B.Q4_1.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q4_1.gguf) | Q4_1 | 4.24GB |
32
+ | [GEITje-7B.Q5_0.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q5_0.gguf) | Q5_0 | 4.65GB |
33
+ | [GEITje-7B.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q5_K_S.gguf) | Q5_K_S | 4.65GB |
34
+ | [GEITje-7B.Q5_K.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q5_K.gguf) | Q5_K | 4.78GB |
35
+ | [GEITje-7B.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q5_K_M.gguf) | Q5_K_M | 4.78GB |
36
+ | [GEITje-7B.Q5_1.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q5_1.gguf) | Q5_1 | 5.07GB |
37
+ | [GEITje-7B.Q6_K.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q6_K.gguf) | Q6_K | 5.53GB |
38
+ | [GEITje-7B.Q8_0.gguf](https://huggingface.co/RichardErkhov/Rijgersberg_-_GEITje-7B-gguf/blob/main/GEITje-7B.Q8_0.gguf) | Q8_0 | 7.17GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ license: apache-2.0
46
+ base_model: mistralai/Mistral-7B-v0.1
47
+ tags:
48
+ - generated_from_trainer
49
+ - GEITje
50
+ datasets:
51
+ - Rijgersberg/GEITje-pretrain-10b
52
+ model-index:
53
+ - name: GEITje-v1-7B
54
+ results: []
55
+ language:
56
+ - nl
57
+ ---
58
+
59
+ # GEITje-7B
60
+
61
+ GEITje is a large open Dutch language model with 7 billion parameters, based on Mistral 7B.
62
+ It has been further trained on 10 billion tokens of Dutch text.
63
+ This has improved its Dutch language skills and increased its knowledge of Dutch topics.
64
+
65
+
66
+ ## Model description
67
+
68
+ ### _Mistral_ – Base Model
69
+ GEITje is based on [Mistral 7B](https://mistral.ai/news/announcing-mistral-7b/).
70
+ It's a large open language model with 7 billion parameters,
71
+ trained by [Mistral AI](https://mistral.ai).
72
+ According to Mistral AI, the 7B model performs better than [Llama 2](https://ai.meta.com/llama/) 13B on all (English-language) benchmarks they tested it on.
73
+ Mistral 7B has been released under the Apache 2.0 open source license.
74
+
75
+
76
+ ### _GEITje_ – Trained Further on Dutch Texts
77
+ GEITje was created by further training Mistral 7B on no less than 10 billion tokens of Dutch text from the [Dutch Gigacorpus](http://gigacorpus.nl) and the [MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400) web crawling corpus.
78
+ It is a so-called _full-parameter finetune_:
79
+ performed on all parameters.
80
+ It is not a [PEFT](https://huggingface.co/blog/peft) or [LoRA](https://huggingface.co/docs/peft/conceptual_guides/lora) finetune.
81
+ Like Mistral, GEITje has a _context length_ of 8,192 tokens.
82
+
83
+ ## More info
84
+ Read more about GEITje in the [📄 README](https://github.com/Rijgersberg/GEITje/blob/main/README-en.md) on GitHub.
85
+
86
+ ## Checkpoints
87
+ Intermediate checkpoints are available in the `checkpoints` branch.
88
+
89
+ ## Training procedure
90
+
91
+ ### Training hyperparameters
92
+
93
+ The following hyperparameters were used during training:
94
+ - learning_rate: 2e-05
95
+ - train_batch_size: 2
96
+ - eval_batch_size: 2
97
+ - seed: 42
98
+ - distributed_type: multi-GPU
99
+ - num_devices: 8
100
+ - gradient_accumulation_steps: 8
101
+ - total_train_batch_size: 128
102
+ - total_eval_batch_size: 16
103
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
104
+ - lr_scheduler_type: cosine
105
+ - lr_scheduler_warmup_steps: 953
106
+ - training_steps: 9536
107
+
108
+ ### Training results
109
+
110
+ | Training Loss | Epoch | Step | Validation Loss |
111
+ |:-------------:|:-----:|:----:|:---------------:|
112
+ | 1.6995 | 0.02 | 199 | 1.7673 |
113
+ | 1.6949 | 0.04 | 398 | 1.6880 |
114
+ | 1.6377 | 0.06 | 597 | 1.6429 |
115
+ | 1.6011 | 0.08 | 796 | 1.6384 |
116
+ | 1.5196 | 0.1 | 995 | 1.6060 |
117
+ | 1.5158 | 0.13 | 1194 | 1.5832 |
118
+ | 1.5181 | 0.15 | 1393 | 1.5541 |
119
+ | 1.4931 | 0.17 | 1592 | 1.5493 |
120
+ | 1.4972 | 0.19 | 1791 | 1.5407 |
121
+ | 1.5349 | 0.21 | 1990 | 1.5305 |
122
+ | 1.5025 | 0.23 | 2189 | 1.5263 |
123
+ | 1.396 | 0.25 | 2388 | 1.5140 |
124
+ | 1.4353 | 0.27 | 2587 | 1.5104 |
125
+ | 1.4307 | 0.29 | 2786 | 1.5003 |
126
+ | 1.3974 | 0.31 | 2985 | 1.4849 |
127
+ | 1.404 | 0.33 | 3184 | 1.4771 |
128
+ | 1.4299 | 0.35 | 3383 | 1.4825 |
129
+ | 1.4342 | 0.38 | 3582 | 1.4705 |
130
+ | 1.4341 | 0.4 | 3781 | 1.4643 |
131
+ | 1.4535 | 0.42 | 3980 | 1.4580 |
132
+ | 1.4799 | 0.44 | 4179 | 1.4521 |
133
+ | 1.35 | 0.46 | 4378 | 1.4478 |
134
+ | 1.4586 | 0.48 | 4577 | 1.4425 |
135
+ | 1.3685 | 0.5 | 4776 | 1.4368 |
136
+ | 1.4572 | 0.52 | 4975 | 1.4313 |
137
+ | 1.3293 | 0.54 | 5174 | 1.4265 |
138
+ | 1.403 | 0.56 | 5373 | 1.4241 |
139
+ | 1.3057 | 0.58 | 5572 | 1.4188 |
140
+ | 1.244 | 0.61 | 5771 | 1.4178 |
141
+ | 1.3224 | 0.63 | 5970 | 1.4110 |
142
+ | 1.3238 | 0.65 | 6169 | 1.4083 |
143
+ | 1.3262 | 0.67 | 6368 | 1.4050 |
144
+ | 1.3237 | 0.69 | 6567 | 1.4027 |
145
+ | 1.0453 | 0.71 | 6766 | 1.4005 |
146
+ | 1.3136 | 0.73 | 6965 | 1.3992 |
147
+ | 1.3137 | 0.75 | 7164 | 1.3975 |
148
+ | 1.1587 | 0.77 | 7363 | 1.3964 |
149
+ | 1.316 | 0.79 | 7562 | 1.3957 |
150
+ | 1.2738 | 0.81 | 7761 | 1.3951 |
151
+ | 1.308 | 0.83 | 7960 | 1.3949 |
152
+ | 1.4049 | 0.86 | 8159 | 1.3946 |
153
+ | 1.3324 | 0.88 | 8358 | 1.3944 |
154
+ | 1.3446 | 0.9 | 8557 | 1.3944 |
155
+ | 1.2489 | 0.92 | 8756 | 1.3943 |
156
+ | 1.2687 | 0.94 | 8955 | 1.3943 |
157
+ | 1.3293 | 0.96 | 9154 | 1.3943 |
158
+ | 1.3045 | 0.98 | 9353 | 1.3943 |
159
+
160
+
161
+ ### Framework versions
162
+
163
+ - Transformers 4.36.0.dev0
164
+ - Pytorch 2.1.1+cu121
165
+ - Datasets 2.15.0
166
+ - Tokenizers 0.15.0
167
+