fietje-2 / README.md
BramVanroy's picture
Update README.md
d63e1d1 verified
|
raw
history blame
2.82 kB
metadata
license: mit
base_model: microsoft/phi-2
tags:
  - trl
  - fietje
  - alignment-handbook
datasets:
  - uonlp/CulturaX
  - wikimedia/wikipedia
model-index:
  - name: fietje-2b
    results: []
language:
  - nl
pipeline_tag: text-generation
inference: false

Fietje banner

Fietje 2B

An open and efficient LLM for Dutch

πŸ‘±β€β™€οΈ Base version (this one) - πŸ€– Instruct version - πŸ’¬ Chat version - πŸš€ GGUF of base model

This model is an adapted version of microsoft/phi-2, finetuned for Dutch text generation. It was continue-pretrained on 28B Dutch tokens, which includes the full Dutch component of Wikipedia (accounting for around 15%), supplemented with Dutch tokens from CulturaX. A newer version of this dataset can be found here, which also describes the filtering that took place.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 9e-05
  • train_batch_size: 40
  • eval_batch_size: 40
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 16
  • gradient_accumulation_steps: 3
  • total_train_batch_size: 1920
  • total_eval_batch_size: 640
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
1.6334 0.13 900 1.5937
1.5469 0.26 1800 1.5051
1.4937 0.4 2700 1.4628
1.4633 0.53 3600 1.4375
1.4485 0.66 4500 1.4203
1.4374 0.79 5400 1.4085
1.4278 0.92 6300 1.4013

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2