Edit model card

mega-ar-525m-v0.07-ultraTBfw

pretraining experiment:

  • 525m params: 1536 hidden size, 3x hidden:FF, 8 layers
  • context length 4096, MEGA EMA dim 32
  • llama-3 tokenizer

Model description

This model is a fine-tuned version of pszemraj/mega-ar-525m-v0.06-fw_longish on the BEE-spoke-data/UltraTextbooks-2.1-fw_mix dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9824
  • Accuracy: 0.5874

Quick eval

Quick eval for: pszemraj/mega-ar-525m-v0.07-ultraTBfw

hf (pretrained=pszemraj/mega-ar-525m-v0.07-ultraTBfw,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc 0.4912 ± 0.0103
none 0 acc_norm 0.4356 ± 0.0102
boolq 2 none 0 acc 0.6092 ± 0.0085
lambada_openai 1 none 0 perplexity 49.3787 ± 2.0179
none 0 acc 0.3078 ± 0.0064
openbookqa 1 none 0 acc 0.1900 ± 0.0176
none 0 acc_norm 0.3060 ± 0.0206
piqa 1 none 0 acc 0.6480 ± 0.0111
none 0 acc_norm 0.6480 ± 0.0111
winogrande 1 none 0 acc 0.5209 ± 0.0140
Downloads last month
2,666
Safetensors
Model size
525M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train pszemraj/mega-ar-525m-v0.07-ultraTBfw