pszemraj's picture
End of training
adc5ed7 verified
|
raw
history blame
No virus
3.11 kB
metadata
license: apache-2.0
base_model: pszemraj/mega-ar-350m-L3t-v0.07-cosmo_webmath_py
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: mega-ar-350m-L3t-v0.07-cosmo_webmath_py-UltraTextbooks-2.1-fw_mix-vN
    results: []

mega-ar-350m-L3t-v0.07-cosmo_webmath_py-UltraTextbooks-2.1-fw_mix-vN

This model is a fine-tuned version of pszemraj/mega-ar-350m-L3t-v0.07-cosmo_webmath_py on the BEE-spoke-data/UltraTextbooks-2.1-fw_mix dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0787
  • Accuracy: 0.5746
  • Num Input Tokens Seen: 3492282368

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 80085
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
2.2572 0.0600 400 2.2462 0.5491 209715200
2.2173 0.1201 800 2.1939 0.5564 419430400
2.1992 0.1801 1200 2.1689 0.5604 629145600
2.1543 0.2402 1600 2.1521 0.5632 838860800
2.1532 0.3002 2000 2.1401 0.5650 1048576000
2.1688 0.3603 2400 2.1307 0.5663 1258291200
2.1443 0.4203 2800 2.1227 0.5676 1468006400
2.1105 0.4804 3200 2.1158 0.5689 1677721600
2.1045 0.5404 3600 2.1090 0.5700 1887436800
2.1181 0.6004 4000 2.1045 0.5708 2097152000
2.127 0.6605 4400 2.0994 0.5716 2306867200
2.1265 0.7205 4800 2.0958 0.5719 2516582400
2.0951 0.7806 5200 2.0909 0.5728 2726297600
2.0951 0.8406 5600 2.0876 0.5733 2936012800
2.1335 0.9007 6000 2.0838 0.5739 3145728000
2.0731 0.9607 6400 2.0802 0.5744 3355443200

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1