pszemraj's picture
Update README.md
f5e092e verified
|
raw
history blame
2.64 kB
metadata
license: apache-2.0
base_model: pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: griffin-1024-llama3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN
    results: []
datasets:
  - BEE-spoke-data/fineweb-1M_en-med
language:
  - en

griffin-llama3t-8L-v0.02-fineweb

Pretraining experiment with griffin/recurrent_gemma arch. This one uses the Llama-3 tokenizer.

Model description

Further training of pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu on the BEE-spoke-data/fineweb-1M_en-med dataset. It achieves the following results on the evaluation set:

  • Loss: 5.6538
  • Accuracy: 0.1881
  • Num Input Tokens Seen: 766509056

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 80085
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-07
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
6.4019 0.0684 400 6.7690 0.1278 52428800
6.0547 0.1368 800 6.4214 0.1460 104857600
5.8133 0.2052 1200 6.2566 0.1550 157286400
5.7212 0.2736 1600 6.1411 0.1620 209715200
5.6175 0.3420 2000 6.0502 0.1669 262144000
5.5014 0.4104 2400 5.9827 0.1687 314572800
5.4882 0.4788 2800 5.9203 0.1731 367001600
5.3972 0.5472 3200 5.8614 0.1782 419430400
5.3983 0.6156 3600 5.8340 0.1773 471859200
5.3175 0.6840 4000 5.7916 0.1814 524288000
5.3014 0.7524 4400 5.7565 0.1814 576716800
5.2749 0.8208 4800 5.7303 0.1849 629145600
5.2264 0.8892 5200 5.6993 0.1850 681574400
5.2107 0.9576 5600 5.6745 0.1884 734003200

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1