llama8b-gsm-real-sftsd1
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0750
- Num Input Tokens Seen: 1235796
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.8595 | 0 |
1.7608 | 0.0214 | 5 | 1.6700 | 25930 |
1.3248 | 0.0428 | 10 | 1.3475 | 52270 |
1.2071 | 0.0642 | 15 | 1.2084 | 79554 |
1.1995 | 0.0856 | 20 | 1.1763 | 105102 |
1.0962 | 0.1070 | 25 | 1.1607 | 131956 |
1.1212 | 0.1284 | 30 | 1.1494 | 158684 |
1.1985 | 0.1499 | 35 | 1.1423 | 184480 |
1.0998 | 0.1713 | 40 | 1.1370 | 211054 |
1.1959 | 0.1927 | 45 | 1.1324 | 236974 |
1.1464 | 0.2141 | 50 | 1.1279 | 262912 |
1.2088 | 0.2355 | 55 | 1.1243 | 289396 |
1.0862 | 0.2569 | 60 | 1.1215 | 316814 |
1.17 | 0.2783 | 65 | 1.1191 | 342274 |
1.079 | 0.2997 | 70 | 1.1173 | 369198 |
1.155 | 0.3211 | 75 | 1.1141 | 396132 |
1.122 | 0.3425 | 80 | 1.1118 | 421548 |
1.0646 | 0.3639 | 85 | 1.1104 | 449306 |
1.1247 | 0.3853 | 90 | 1.1071 | 473942 |
1.0455 | 0.4067 | 95 | 1.1065 | 500546 |
1.1771 | 0.4282 | 100 | 1.1047 | 525364 |
1.0121 | 0.4496 | 105 | 1.1031 | 552868 |
1.0939 | 0.4710 | 110 | 1.1028 | 579098 |
1.133 | 0.4924 | 115 | 1.1005 | 604876 |
1.0363 | 0.5138 | 120 | 1.0987 | 629760 |
0.9986 | 0.5352 | 125 | 1.0972 | 657158 |
1.0632 | 0.5566 | 130 | 1.0968 | 683064 |
1.0441 | 0.5780 | 135 | 1.0940 | 710802 |
1.0112 | 0.5994 | 140 | 1.0930 | 737182 |
1.0467 | 0.6208 | 145 | 1.0914 | 763298 |
1.0917 | 0.6422 | 150 | 1.0897 | 790790 |
1.0613 | 0.6636 | 155 | 1.0891 | 818288 |
0.9827 | 0.6850 | 160 | 1.0883 | 845282 |
1.1266 | 0.7064 | 165 | 1.0874 | 870452 |
1.0661 | 0.7279 | 170 | 1.0859 | 896976 |
1.1039 | 0.7493 | 175 | 1.0852 | 923846 |
1.0813 | 0.7707 | 180 | 1.0842 | 949236 |
1.0729 | 0.7921 | 185 | 1.0835 | 977230 |
1.0617 | 0.8135 | 190 | 1.0838 | 1003880 |
1.1071 | 0.8349 | 195 | 1.0825 | 1029762 |
1.0408 | 0.8563 | 200 | 1.0810 | 1057616 |
1.0801 | 0.8777 | 205 | 1.0799 | 1084200 |
1.0656 | 0.8991 | 210 | 1.0786 | 1110340 |
1.1181 | 0.9205 | 215 | 1.0787 | 1136600 |
0.9485 | 0.9419 | 220 | 1.0782 | 1164358 |
1.0608 | 0.9633 | 225 | 1.0772 | 1192626 |
1.1137 | 0.9847 | 230 | 1.0755 | 1219714 |
Framework versions
- Transformers 4.46.0
- Pytorch 2.4.1.post300
- Datasets 2.20.0
- Tokenizers 0.20.1
- Downloads last month
- 8
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for jkazdan/llama8b-gsm-real-sftsd1
Base model
meta-llama/Meta-Llama-3-8B-Instruct