mega-ar-525m-v0.07-ultraTBfw
pretraining experiment:
- 525m params: 1536 hidden size, 3x hidden:FF, 8 layers
- context length 4096, MEGA EMA dim 32
- llama-3 tokenizer
Model description
This model is a fine-tuned version of pszemraj/mega-ar-525m-v0.06-fw_longish on the BEE-spoke-data/UltraTextbooks-2.1-fw_mix dataset. It achieves the following results on the evaluation set:
- Loss: 1.9824
- Accuracy: 0.5874
Quick eval
Quick eval for: pszemraj/mega-ar-525m-v0.07-ultraTBfw
hf (pretrained=pszemraj/mega-ar-525m-v0.07-ultraTBfw,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_easy | 1 | none | 0 | acc | 0.4912 | ± | 0.0103 |
none | 0 | acc_norm | 0.4356 | ± | 0.0102 | ||
boolq | 2 | none | 0 | acc | 0.6092 | ± | 0.0085 |
lambada_openai | 1 | none | 0 | perplexity | 49.3787 | ± | 2.0179 |
none | 0 | acc | 0.3078 | ± | 0.0064 | ||
openbookqa | 1 | none | 0 | acc | 0.1900 | ± | 0.0176 |
none | 0 | acc_norm | 0.3060 | ± | 0.0206 | ||
piqa | 1 | none | 0 | acc | 0.6480 | ± | 0.0111 |
none | 0 | acc_norm | 0.6480 | ± | 0.0111 | ||
winogrande | 1 | none | 0 | acc | 0.5209 | ± | 0.0140 |
- Downloads last month
- 2,666
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.