Text Generation
Transformers
Safetensors
English
mega
Generated from Trainer
Inference Endpoints
Edit model card

mega-ar-350m-v0.13

Model description

Continued-training of BEE-spoke-data/mega-ar-350m-L3t-v0.08-ultraTBfw on a few more datasets.

It achieves the following results on the evaluation set (BEE-spoke-data/UltraTextbooks-2.1-fw_mix):

  • Loss: 1.9926
  • Accuracy: 0.5885
  • Num Input Tokens Seen: 3468165120

Quick eval

Quick eval for: pszemraj/mega-ar-350m-v0.13

hf (pretrained=pszemraj/mega-ar-350m-v0.13,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc 0.4491 ± 0.0102
none 0 acc_norm 0.4061 ± 0.0101
boolq 2 none 0 acc 0.5367 ± 0.0087
lambada_openai 1 none 0 perplexity 55.3308 ± 2.3100
none 0 acc 0.3113 ± 0.0065
openbookqa 1 none 0 acc 0.1760 ± 0.0170
none 0 acc_norm 0.2680 ± 0.0198
piqa 1 none 0 acc 0.6366 ± 0.0112
none 0 acc_norm 0.6213 ± 0.0113
winogrande 1 none 0 acc 0.5036 ± 0.0141

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 80085
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 96
  • total_eval_batch_size: 3
  • optimizer: Adam with betas=(0.9,0.985) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0
Downloads last month
1,698
Safetensors
Model size
350M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train pszemraj/mega-ar-350m-v0.13