pszemraj's picture
Model save
2c8738e verified
|
raw
history blame
No virus
2.84 kB
metadata
license: apache-2.0
base_model: pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: griffin-1024-llama3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN
    results: []

griffin-1024-llama3t-8layer-simplewiki-silu-fineweb-1M_en-med-vN

This model is a fine-tuned version of pszemraj/griffin-1024-llama3t-8layer-simplewiki-silu on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.6745
  • Accuracy: 0.1884
  • Num Input Tokens Seen: 734003200

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 80085
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-07
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
6.4019 0.0684 400 6.7690 0.1278 52428800
6.0547 0.1368 800 6.4214 0.1460 104857600
5.8133 0.2052 1200 6.2566 0.1550 157286400
5.7212 0.2736 1600 6.1411 0.1620 209715200
5.6175 0.3420 2000 6.0502 0.1669 262144000
5.5014 0.4104 2400 5.9827 0.1687 314572800
5.4882 0.4788 2800 5.9203 0.1731 367001600
5.3972 0.5472 3200 5.8614 0.1782 419430400
5.3983 0.6156 3600 5.8340 0.1773 471859200
5.3175 0.6840 4000 5.7916 0.1814 524288000
5.3014 0.7524 4400 5.7565 0.1814 576716800
5.2749 0.8208 4800 5.7303 0.1849 629145600
5.2264 0.8892 5200 5.6993 0.1850 681574400
5.2107 0.9576 5600 5.6745 0.1884 734003200

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1