Text Generation
Transformers
Safetensors
English
mega
Generated from Trainer
Inference Endpoints
mega-ar-350m-v0.13 / README.md
pszemraj's picture
End of training
26fd756 verified
metadata
license: apache-2.0
base_model: pszemraj/mega-ar-350m-v0.12-napierone_epub
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: mega-ar-350m-v0.12-napierone_epub-UltraTextbooks-2.1-fw_mix-vN
    results: []

mega-ar-350m-v0.12-napierone_epub-UltraTextbooks-2.1-fw_mix-vN

This model is a fine-tuned version of pszemraj/mega-ar-350m-v0.12-napierone_epub on the BEE-spoke-data/UltraTextbooks-2.1-fw_mix dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9926
  • Accuracy: 0.5885
  • Num Input Tokens Seen: 3468165120

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 80085
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 96
  • total_eval_batch_size: 3
  • optimizer: Adam with betas=(0.9,0.985) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
2.2374 0.0454 400 2.1871 0.5588 157286400
2.143 0.0907 800 2.1336 0.5665 314572800
2.1272 0.1361 1200 2.1092 0.5698 471859200
2.1243 0.1814 1600 2.0929 0.5725 629145600
2.1021 0.2268 2000 2.0794 0.5747 786432000
2.0794 0.2721 2400 2.0687 0.5762 943718400
2.0843 0.3175 2800 2.0592 0.5776 1101004800
2.0571 0.3628 3200 2.0507 0.5793 1258291200
2.0841 0.4082 3600 2.0435 0.5802 1415577600
2.0484 0.4535 4000 2.0363 0.5813 1572864000
2.0199 0.4989 4400 2.0315 0.5820 1730150400
2.0361 0.5442 4800 2.0261 0.5829 1887436800
2.057 0.5896 5200 2.0207 0.5838 2044723200
2.0234 0.6349 5600 2.0163 0.5845 2202009600
2.073 0.6803 6000 2.0120 0.5850 2359296000
2.058 0.7256 6400 2.0074 0.5862 2516582400
2.0253 0.7710 6800 2.0041 0.5866 2673868800
1.995 0.8163 7200 2.0010 0.5872 2831155200
1.9735 0.8617 7600 1.9987 0.5875 2988441600
1.9799 0.9070 8000 1.9960 0.5880 3145728000
2.0056 0.9524 8400 1.9942 0.5882 3303014400
1.9961 0.9977 8800 1.9926 0.5884 3460300800

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1