--- license: apache-2.0 tags: - generated_from_trainer metrics: - accuracy language: - en datasets: - BEE-spoke-data/UltraTextbooks-2.1-fw_mix - BEE-spoke-data/napierone-epub-raw - BEE-spoke-data/knowledge-inoc-concat-v1 --- # mega-ar-350m-v0.13 ## Model description Continued-training of [BEE-spoke-data/mega-ar-350m-L3t-v0.08-ultraTBfw](https://hf.co/BEE-spoke-data/mega-ar-350m-L3t-v0.08-ultraTBfw) on a few more datasets. It achieves the following results on the evaluation set (`BEE-spoke-data/UltraTextbooks-2.1-fw_mix`): - Loss: 1.9926 - Accuracy: 0.5885 - Num Input Tokens Seen: 3468165120 ## Quick eval Quick eval for: pszemraj/mega-ar-350m-v0.13 hf (pretrained=pszemraj/mega-ar-350m-v0.13,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8 | Tasks |Version|Filter|n-shot| Metric | Value | |Stderr| |--------------|------:|------|-----:|----------|------:|---|-----:| |arc_easy | 1|none | 0|acc | 0.4491|± |0.0102| | | |none | 0|acc_norm | 0.4061|± |0.0101| |boolq | 2|none | 0|acc | 0.5367|± |0.0087| |lambada_openai| 1|none | 0|perplexity|55.3308|± |2.3100| | | |none | 0|acc | 0.3113|± |0.0065| |openbookqa | 1|none | 0|acc | 0.1760|± |0.0170| | | |none | 0|acc_norm | 0.2680|± |0.0198| |piqa | 1|none | 0|acc | 0.6366|± |0.0112| | | |none | 0|acc_norm | 0.6213|± |0.0113| |winogrande | 1|none | 0|acc | 0.5036|± |0.0141| ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 80085 - distributed_type: multi-GPU - num_devices: 3 - gradient_accumulation_steps: 32 - total_train_batch_size: 96 - total_eval_batch_size: 3 - optimizer: Adam with betas=(0.9,0.985) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1.0