Text Generation
Transformers
Safetensors
English
mega
Generated from Trainer
Inference Endpoints
Edit model card

mega-ar-350m-v0.13

Model description

Continued-training of BEE-spoke-data/mega-ar-350m-L3t-v0.08-ultraTBfw on a few more datasets.

It achieves the following results on the evaluation set (BEE-spoke-data/UltraTextbooks-2.1-fw_mix):

  • Loss: 1.9926
  • Accuracy: 0.5885
  • Num Input Tokens Seen: 3468165120

Quick eval

Quick eval for: pszemraj/mega-ar-350m-v0.13

hf (pretrained=pszemraj/mega-ar-350m-v0.13,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc 0.4491 ± 0.0102
none 0 acc_norm 0.4061 ± 0.0101
boolq 2 none 0 acc 0.5367 ± 0.0087
lambada_openai 1 none 0 perplexity 55.3308 ± 2.3100
none 0 acc 0.3113 ± 0.0065
openbookqa 1 none 0 acc 0.1760 ± 0.0170
none 0 acc_norm 0.2680 ± 0.0198
piqa 1 none 0 acc 0.6366 ± 0.0112
none 0 acc_norm 0.6213 ± 0.0113
winogrande 1 none 0 acc 0.5036 ± 0.0141

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 80085
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 96
  • total_eval_batch_size: 3
  • optimizer: Adam with betas=(0.9,0.985) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0
Downloads last month
1,259
Safetensors
Model size
350M params
Tensor type
F32
·

Datasets used to train pszemraj/mega-ar-350m-v0.13