metadata

license: apache-2.0
base_model: pszemraj/mega-ar-350m-v0.12-napierone_epub
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: mega-ar-350m-v0.12-napierone_epub-UltraTextbooks-2.1-fw_mix-vN
    results: []

mega-ar-350m-v0.12-napierone_epub-UltraTextbooks-2.1-fw_mix-vN

This model is a fine-tuned version of pszemraj/mega-ar-350m-v0.12-napierone_epub on the BEE-spoke-data/UltraTextbooks-2.1-fw_mix dataset. It achieves the following results on the evaluation set:

Loss: 1.9926
Accuracy: 0.5885
Num Input Tokens Seen: 3468165120

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 1
seed: 80085
distributed_type: multi-GPU
num_devices: 3
gradient_accumulation_steps: 32
total_train_batch_size: 96
total_eval_batch_size: 3
optimizer: Adam with betas=(0.9,0.985) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Input Tokens Seen
2.2374	0.0454	400	2.1871	0.5588	157286400
2.143	0.0907	800	2.1336	0.5665	314572800
2.1272	0.1361	1200	2.1092	0.5698	471859200
2.1243	0.1814	1600	2.0929	0.5725	629145600
2.1021	0.2268	2000	2.0794	0.5747	786432000
2.0794	0.2721	2400	2.0687	0.5762	943718400
2.0843	0.3175	2800	2.0592	0.5776	1101004800
2.0571	0.3628	3200	2.0507	0.5793	1258291200
2.0841	0.4082	3600	2.0435	0.5802	1415577600
2.0484	0.4535	4000	2.0363	0.5813	1572864000
2.0199	0.4989	4400	2.0315	0.5820	1730150400
2.0361	0.5442	4800	2.0261	0.5829	1887436800
2.057	0.5896	5200	2.0207	0.5838	2044723200
2.0234	0.6349	5600	2.0163	0.5845	2202009600
2.073	0.6803	6000	2.0120	0.5850	2359296000
2.058	0.7256	6400	2.0074	0.5862	2516582400
2.0253	0.7710	6800	2.0041	0.5866	2673868800
1.995	0.8163	7200	2.0010	0.5872	2831155200
1.9735	0.8617	7600	1.9987	0.5875	2988441600
1.9799	0.9070	8000	1.9960	0.5880	3145728000
2.0056	0.9524	8400	1.9942	0.5882	3303014400
1.9961	0.9977	8800	1.9926	0.5884	3460300800

Framework versions

Transformers 4.40.2
Pytorch 2.2.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1