Quantization made by Richard Erkhov. [Github](https://github.com/RichardErkhov) [Discord](https://discord.gg/pvy7H8DZMG) [Request more models](https://github.com/RichardErkhov/quant_request) pythia-31m-goodwiki-deduped-2048-scratch - bnb 8bits - Model creator: https://huggingface.co/pszemraj/ - Original model: https://huggingface.co/pszemraj/pythia-31m-goodwiki-deduped-2048-scratch/ Original model description: --- tags: - generated_from_trainer metrics: - accuracy inference: parameters: max_new_tokens: 64 do_sample: true repetition_penalty: 1.1 no_repeat_ngram_size: 5 guidance_scale: 1.01 eta_cutoff: 0.001 widget: - text: My name is El Microondas the Wise and example_title: El Microondas - text: A meme is example_title: meme - text: >- Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had example_title: Coreference resolution - text: >- On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book example_title: Logic puzzles - text: >- The two men running to become New York City's next mayor will face off in their first debate Wednesday night example_title: Reading comprehension pipeline_tag: text-generation license: apache-2.0 datasets: - euirim/goodwiki language: - en --- # pythia-31m-goodwiki-deduped-2048-scratch Train from scratch based on config of [EleutherAI/pythia-31m](https://huggingface.co/EleutherAI/pythia-31m) for 3 epochs. It achieves the following results on the evaluation set: - Loss: 4.5181 - Accuracy: 0.2680 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data ``` ***** eval metrics ***** epoch = 3.0 eval_accuracy = 0.2694 eval_loss = 4.4986 eval_runtime = 0:00:14.62 eval_samples = 500 eval_samples_per_second = 34.187 eval_steps_per_second = 17.093 perplexity = 89.8934 ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 2 - eval_batch_size: 2 - seed: 80085 - gradient_accumulation_steps: 64 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-07 - lr_scheduler_type: inverse_sqrt - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 3.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:--------:| | 6.8347 | 0.16 | 100 | 6.7683 | 0.1380 | | 6.0732 | 0.32 | 200 | 6.0489 | 0.1712 | | 5.6949 | 0.48 | 300 | 5.6941 | 0.1935 | | 5.4723 | 0.64 | 400 | 5.4411 | 0.2066 | | 5.2672 | 0.8 | 500 | 5.2621 | 0.2162 | | 5.165 | 0.96 | 600 | 5.1339 | 0.2241 | | 5.0693 | 1.12 | 700 | 5.0290 | 0.2304 | | 4.9234 | 1.28 | 800 | 4.9430 | 0.2369 | | 4.886 | 1.44 | 900 | 4.8702 | 0.2413 | | 4.8422 | 1.6 | 1000 | 4.8086 | 0.2458 | | 4.7688 | 1.76 | 1100 | 4.7593 | 0.2488 | | 4.734 | 1.93 | 1200 | 4.7118 | 0.2527 | | 4.6877 | 2.09 | 1300 | 4.6721 | 0.2556 | | 4.6135 | 2.25 | 1400 | 4.6350 | 0.2583 | | 4.6117 | 2.41 | 1500 | 4.6013 | 0.2606 | | 4.5424 | 2.57 | 1600 | 4.5707 | 0.2635 | | 4.5535 | 2.73 | 1700 | 4.5447 | 0.2658 | | 4.4823 | 2.89 | 1800 | 4.5181 | 0.2680 | ### Framework versions - Transformers 4.33.1 - Pytorch 2.2.0.dev20230907+cu118 - Datasets 2.14.5 - Tokenizers 0.13.3 # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pszemraj__pythia-31m-goodwiki-deduped-2048-scratch) | Metric | Value | |-----------------------|---------------------------| | Avg. | 24.85 | | ARC (25-shot) | 23.12 | | HellaSwag (10-shot) | 25.66 | | MMLU (5-shot) | 23.11 | | TruthfulQA (0-shot) | 51.32 | | Winogrande (5-shot) | 49.88 | | GSM8K (5-shot) | 0.0 | | DROP (3-shot) | 0.86 |