YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Quantization made by Richard Erkhov.
pythia-31m-simplepile-lite-2048-scratch-2e - GGUF
- Model creator: https://huggingface.co/pszemraj/
- Original model: https://huggingface.co/pszemraj/pythia-31m-simplepile-lite-2048-scratch-2e/
Original model description:
tags: - generated_from_trainer metrics: - accuracy inference: parameters: max_new_tokens: 64 do_sample: true repetition_penalty: 1.1 no_repeat_ngram_size: 5 guidance_scale: 1.01 eta_cutoff: 0.001 widget: - text: My name is El Microondas the Wise and example_title: El Microondas - text: A meme is example_title: meme - text: >- Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had example_title: Coreference resolution - text: >- On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book example_title: Logic puzzles - text: >- The two men running to become New York City's next mayor will face off in their first debate Wednesday night example_title: Reading comprehension pipeline_tag: text-generation license: apache-2.0 datasets: - pszemraj/simplepile-lite
BL-pythia-31m-simplepile-lite-2048-scratch
Train from scratch based on config of EleutherAI/pythia-31m on the None dataset. It achieves the following results on the evaluation set:
- Loss: 3.9891
- Accuracy: 0.3498
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 2
- eval_batch_size: 1
- seed: 80085
- gradient_accumulation_steps: 64
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-07
- lr_scheduler_type: inverse_sqrt
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 2.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
7.4089 | 0.07 | 100 | 7.3885 | 0.1133 |
6.2774 | 0.13 | 200 | 6.2091 | 0.1621 |
5.7019 | 0.2 | 300 | 5.7450 | 0.1890 |
5.4922 | 0.27 | 400 | 5.4697 | 0.2080 |
5.233 | 0.33 | 500 | 5.2846 | 0.2195 |
5.0523 | 0.4 | 600 | 5.1479 | 0.2296 |
4.9396 | 0.47 | 700 | 5.0391 | 0.2376 |
4.7633 | 0.53 | 800 | 4.9366 | 0.2458 |
4.7516 | 0.6 | 900 | 4.8339 | 0.2559 |
4.5937 | 0.67 | 1000 | 4.7286 | 0.2676 |
4.5079 | 0.73 | 1100 | 4.6293 | 0.2798 |
4.4608 | 0.8 | 1200 | 4.5433 | 0.2903 |
4.3426 | 0.87 | 1300 | 4.4719 | 0.2988 |
4.1722 | 0.93 | 1400 | 4.4089 | 0.3057 |
4.1655 | 1.0 | 1500 | 4.3585 | 0.3107 |
4.0927 | 1.07 | 1600 | 4.3101 | 0.3161 |
4.1439 | 1.13 | 1700 | 4.2714 | 0.3206 |
4.0064 | 1.2 | 1800 | 4.2330 | 0.3249 |
4.0633 | 1.27 | 1900 | 4.2015 | 0.3281 |
3.9948 | 1.33 | 2000 | 4.1702 | 0.3311 |
3.9389 | 1.4 | 2100 | 4.1439 | 0.3338 |
3.8833 | 1.47 | 2200 | 4.1200 | 0.3367 |
3.8411 | 1.53 | 2300 | 4.0949 | 0.3395 |
3.8481 | 1.6 | 2400 | 4.0764 | 0.3408 |
3.8397 | 1.67 | 2500 | 4.0578 | 0.3420 |
3.8897 | 1.73 | 2600 | 4.0383 | 0.3440 |
3.8785 | 1.8 | 2700 | 4.0206 | 0.3459 |
3.8126 | 1.87 | 2800 | 4.0044 | 0.3478 |
3.783 | 1.93 | 2900 | 3.9891 | 0.3498 |
Framework versions
- Transformers 4.33.1
- Pytorch 2.2.0.dev20230907+cu118
- Datasets 2.14.5
- Tokenizers 0.13.3
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 24.7 |
ARC (25-shot) | 21.59 |
HellaSwag (10-shot) | 25.79 |
MMLU (5-shot) | 24.99 |
TruthfulQA (0-shot) | 50.62 |
Winogrande (5-shot) | 48.62 |
GSM8K (5-shot) | 0.0 |
DROP (3-shot) | 1.32 |
- Downloads last month
- 72
Unable to determine this model's library. Check the
docs
.