Edit model card

mega-ar-525m-v0.07-ultraTBfw

pretraining experiment:

  • 525m params: 1536 hidden size, 3x hidden:FF, 8 layers
  • context length 4096, MEGA EMA dim 32
  • llama-3 tokenizer

Model description

This model is a fine-tuned version of pszemraj/mega-ar-525m-v0.06-fw_longish on the BEE-spoke-data/UltraTextbooks-2.1-fw_mix dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9824
  • Accuracy: 0.5874

Quick eval

Quick eval for: pszemraj/mega-ar-525m-v0.07-ultraTBfw

hf (pretrained=pszemraj/mega-ar-525m-v0.07-ultraTBfw,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc 0.4912 ± 0.0103
none 0 acc_norm 0.4356 ± 0.0102
boolq 2 none 0 acc 0.6092 ± 0.0085
lambada_openai 1 none 0 perplexity 49.3787 ± 2.0179
none 0 acc 0.3078 ± 0.0064
openbookqa 1 none 0 acc 0.1900 ± 0.0176
none 0 acc_norm 0.3060 ± 0.0206
piqa 1 none 0 acc 0.6480 ± 0.0111
none 0 acc_norm 0.6480 ± 0.0111
winogrande 1 none 0 acc 0.5209 ± 0.0140
Downloads last month
658
Safetensors
Model size
525M params
Tensor type
F32
·

Dataset used to train pszemraj/mega-ar-525m-v0.07-ultraTBfw