--- license: apache-2.0 base_model: pszemraj/random-mega-ar-small-4096 tags: - generated_from_trainer metrics: - accuracy datasets: - EleutherAI/wikitext_document_level language: - en pipeline_tag: text-generation inference: parameters: max_new_tokens: 96 do_sample: True repetition_penalty: 1.05 guidance_scale: 1.02 eta_cutoff: 0.001 --- # mega-ar-small-4096: MWE ## mega-ar on wikitext-103-raw-v1 (document level) This model is a fine-tuned version of [pszemraj/random-mega-ar-small-4096](https://huggingface.co/pszemraj/random-mega-ar-small-4096) on the `EleutherAI/wikitext_document_level` dataset (`wikitext-103-raw-v1`). This model has ~ 65M params. It achieves the following results on the evaluation set: - Loss: 4.0338 - Accuracy: 0.3243 ## Training procedure This was tuned with `bf16`, while the authors recommend tuning with `fp32`. Will try `fp32` later. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 4 - eval_batch_size: 1 - seed: 80085 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 3.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:--------:| | 7.3662 | 0.11 | 100 | 7.2782 | 0.0935 | | 6.3064 | 0.22 | 200 | 6.2066 | 0.1634 | | 5.8203 | 0.33 | 300 | 5.7299 | 0.1931 | | 5.55 | 0.44 | 400 | 5.4173 | 0.2117 | | 5.3194 | 0.55 | 500 | 5.1937 | 0.2278 | | 5.1678 | 0.66 | 600 | 5.0206 | 0.2406 | | 5.0375 | 0.77 | 700 | 4.8891 | 0.2508 | | 4.9194 | 0.88 | 800 | 4.7592 | 0.2605 | | 4.8272 | 0.99 | 900 | 4.6653 | 0.2681 | | 4.7571 | 1.1 | 1000 | 4.5817 | 0.2754 | | 4.6345 | 1.21 | 1100 | 4.5066 | 0.2820 | | 4.6218 | 1.32 | 1200 | 4.4472 | 0.2867 | | 4.5585 | 1.43 | 1300 | 4.3827 | 0.2923 | | 4.5047 | 1.54 | 1400 | 4.3328 | 0.2963 | | 4.4726 | 1.65 | 1500 | 4.2860 | 0.3002 | | 4.4094 | 1.76 | 1600 | 4.2452 | 0.3038 | | 4.3705 | 1.87 | 1700 | 4.2168 | 0.3062 | | 4.3739 | 1.98 | 1800 | 4.1852 | 0.3095 | | 4.2836 | 2.09 | 1900 | 4.1599 | 0.3112 | | 4.302 | 2.2 | 2000 | 4.1307 | 0.3149 | | 4.2847 | 2.31 | 2100 | 4.1113 | 0.3165 | | 4.2348 | 2.42 | 2200 | 4.0925 | 0.3184 | | 4.2837 | 2.53 | 2300 | 4.0743 | 0.3207 | | 4.2058 | 2.64 | 2400 | 4.0612 | 0.3217 | | 4.22 | 2.75 | 2500 | 4.0494 | 0.3224 | | 4.1827 | 2.86 | 2600 | 4.0397 | 0.3237 | | 4.1967 | 2.97 | 2700 | 4.0338 | 0.3243 | ### Framework versions - Transformers 4.32.1 - Pytorch 2.1.0.dev20230727+cu118 - Datasets 2.13.1 - Tokenizers 0.13.3