RichardErkhov
/

Rijgersberg_-_GEITje-7B-8bits

+Quantization made by Richard Erkhov.
+[Github](https://github.com/RichardErkhov)
+[Discord](https://discord.gg/pvy7H8DZMG)
+[Request more models](https://github.com/RichardErkhov/quant_request)
+GEITje-7B - bnb 8bits
+- Model creator: https://huggingface.co/Rijgersberg/
+- Original model: https://huggingface.co/Rijgersberg/GEITje-7B/
+Original model description:
+---
+license: apache-2.0
+base_model: mistralai/Mistral-7B-v0.1
+tags:
+- generated_from_trainer
+- GEITje
+datasets:
+- Rijgersberg/GEITje-pretrain-10b
+model-index:
+- name: GEITje-v1-7B
+  results: []
+language:
+- nl
+---
+# GEITje-7B
+GEITje is a large open Dutch language model with 7 billion parameters, based on Mistral 7B.
+It has been further trained on 10 billion tokens of Dutch text.
+This has improved its Dutch language skills and increased its knowledge of Dutch topics.
+## Model description
+### _Mistral_ – Base Model
+GEITje is based on [Mistral 7B](https://mistral.ai/news/announcing-mistral-7b/).
+It's a large open language model with 7 billion parameters,
+trained by [Mistral AI](https://mistral.ai).
+According to Mistral AI, the 7B model performs better than [Llama 2](https://ai.meta.com/llama/) 13B on all (English-language) benchmarks they tested it on.
+Mistral 7B has been released under the Apache 2.0 open source license.
+### _GEITje_ – Trained Further on Dutch Texts
+GEITje was created by further training Mistral 7B on no less than 10 billion tokens of Dutch text from the [Dutch Gigacorpus](http://gigacorpus.nl) and the [MADLAD-400](https://huggingface.co/datasets/allenai/MADLAD-400) web crawling corpus.
+It is a so-called _full-parameter finetune_:
+performed on all parameters.
+It is not a [PEFT](https://huggingface.co/blog/peft) or [LoRA](https://huggingface.co/docs/peft/conceptual_guides/lora) finetune.
+Like Mistral, GEITje has a _context length_ of 8,192 tokens.
+## More info
+Read more about GEITje in the [📄 README](https://github.com/Rijgersberg/GEITje/blob/main/README-en.md) on GitHub.
+## Checkpoints
+Intermediate checkpoints are available in the `checkpoints` branch.
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 128
+- total_eval_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 953
+- training_steps: 9536
+### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 1.6995        | 0.02  | 199  | 1.7673          |
+| 1.6949        | 0.04  | 398  | 1.6880          |
+| 1.6377        | 0.06  | 597  | 1.6429          |
+| 1.6011        | 0.08  | 796  | 1.6384          |
+| 1.5196        | 0.1   | 995  | 1.6060          |
+| 1.5158        | 0.13  | 1194 | 1.5832          |
+| 1.5181        | 0.15  | 1393 | 1.5541          |
+| 1.4931        | 0.17  | 1592 | 1.5493          |
+| 1.4972        | 0.19  | 1791 | 1.5407          |
+| 1.5349        | 0.21  | 1990 | 1.5305          |
+| 1.5025        | 0.23  | 2189 | 1.5263          |
+| 1.396         | 0.25  | 2388 | 1.5140          |
+| 1.4353        | 0.27  | 2587 | 1.5104          |
+| 1.4307        | 0.29  | 2786 | 1.5003          |
+| 1.3974        | 0.31  | 2985 | 1.4849          |
+| 1.404         | 0.33  | 3184 | 1.4771          |
+| 1.4299        | 0.35  | 3383 | 1.4825          |
+| 1.4342        | 0.38  | 3582 | 1.4705          |
+| 1.4341        | 0.4   | 3781 | 1.4643          |
+| 1.4535        | 0.42  | 3980 | 1.4580          |
+| 1.4799        | 0.44  | 4179 | 1.4521          |
+| 1.35          | 0.46  | 4378 | 1.4478          |
+| 1.4586        | 0.48  | 4577 | 1.4425          |
+| 1.3685        | 0.5   | 4776 | 1.4368          |
+| 1.4572        | 0.52  | 4975 | 1.4313          |
+| 1.3293        | 0.54  | 5174 | 1.4265          |
+| 1.403         | 0.56  | 5373 | 1.4241          |
+| 1.3057        | 0.58  | 5572 | 1.4188          |
+| 1.244         | 0.61  | 5771 | 1.4178          |
+| 1.3224        | 0.63  | 5970 | 1.4110          |
+| 1.3238        | 0.65  | 6169 | 1.4083          |
+| 1.3262        | 0.67  | 6368 | 1.4050          |
+| 1.3237        | 0.69  | 6567 | 1.4027          |
+| 1.0453        | 0.71  | 6766 | 1.4005          |
+| 1.3136        | 0.73  | 6965 | 1.3992          |
+| 1.3137        | 0.75  | 7164 | 1.3975          |
+| 1.1587        | 0.77  | 7363 | 1.3964          |
+| 1.316         | 0.79  | 7562 | 1.3957          |
+| 1.2738        | 0.81  | 7761 | 1.3951          |
+| 1.308         | 0.83  | 7960 | 1.3949          |
+| 1.4049        | 0.86  | 8159 | 1.3946          |
+| 1.3324        | 0.88  | 8358 | 1.3944          |
+| 1.3446        | 0.9   | 8557 | 1.3944          |
+| 1.2489        | 0.92  | 8756 | 1.3943          |
+| 1.2687        | 0.94  | 8955 | 1.3943          |
+| 1.3293        | 0.96  | 9154 | 1.3943          |
+| 1.3045        | 0.98  | 9353 | 1.3943          |
+### Framework versions
+- Transformers 4.36.0.dev0
+- Pytorch 2.1.1+cu121
+- Datasets 2.15.0
+- Tokenizers 0.15.0