mega-ar-small-4096-NC-minipile-v1
65M parameter MEGA autoregressive model initialized from scratch and trained on:
pszemraj/simple_wikipedia_LM
JeanKaddour/minipile
It achieves the following results on the evaluation set:
- Loss: 3.7502
- Accuracy: 0.3650
eval
initial 'get the feet wet':
hf-causal-experimental (pretrained=pszemraj/mega-ar-small-4096-sw_minipile,revision=main,trust_remote_code=True,dtype='float'), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
Task | Version | Metric | Value | Stderr | |
---|---|---|---|---|---|
arc_easy | 0 | acc | 0.3173 | ± | 0.0096 |
acc_norm | 0.3022 | ± | 0.0094 | ||
boolq | 1 | acc | 0.4107 | ± | 0.0086 |
lambada_openai | 0 | ppl | 6843.1824 | ± | 295.0792 |
acc | 0.0155 | ± | 0.0017 | ||
openbookqa | 0 | acc | 0.1220 | ± | 0.0147 |
acc_norm | 0.2480 | ± | 0.0193 | ||
piqa | 0 | acc | 0.5609 | ± | 0.0116 |
acc_norm | 0.5566 | ± | 0.0116 | ||
winogrande | 0 | acc | 0.5059 | ± | 0.0141 |
still some ways to go.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 1
- eval_batch_size: 1
- seed: 80085
- gradient_accumulation_steps: 64
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1.0
Framework versions
- Transformers 4.33.1
- Pytorch 2.2.0.dev20230907+cu118
- Datasets 2.14.5
- Tokenizers 0.13.3
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for pszemraj/mega-ar-small-4096-sw_minipile
Base model
pszemraj/mega-ar-small-4096-NC-simplewiki-v1