workingtrain_AX_cluster_more_tokens
This model is a fine-tuned version of google/gemma-2b on the None dataset. It achieves the following results on the evaluation set:
- Loss: 3.2403
- Rouge1: 0.3763
- Rouge2: 0.1192
- Rougel: 0.3528
- Rougelsum: 0.3528
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.00021
- train_batch_size: 20
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 40
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 2
- num_epochs: 2
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
---|---|---|---|---|---|---|---|
3.3875 | 0.1333 | 200 | 3.3124 | 0.3546 | 0.1068 | 0.3356 | 0.3359 |
3.3444 | 0.2667 | 400 | 3.2866 | 0.3706 | 0.1103 | 0.3459 | 0.3461 |
3.1973 | 0.4 | 600 | 3.2682 | 0.3717 | 0.1138 | 0.3475 | 0.3477 |
3.2284 | 0.5333 | 800 | 3.2623 | 0.3740 | 0.1160 | 0.3505 | 0.3507 |
3.3128 | 0.6667 | 1000 | 3.2489 | 0.3732 | 0.1127 | 0.3488 | 0.3489 |
3.2104 | 0.8 | 1200 | 3.2412 | 0.3754 | 0.1144 | 0.3514 | 0.3514 |
3.0321 | 0.9333 | 1400 | 3.2356 | 0.3739 | 0.1150 | 0.3495 | 0.3497 |
3.0487 | 1.0667 | 1600 | 3.2392 | 0.3754 | 0.1173 | 0.3525 | 0.3526 |
3.1087 | 1.2 | 1800 | 3.2406 | 0.3732 | 0.1152 | 0.3498 | 0.3496 |
3.2041 | 1.3333 | 2000 | 3.2403 | 0.3763 | 0.1192 | 0.3528 | 0.3528 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.4
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for ErikBode/workingtrain_AX_cluster_more_tokens
Base model
google/gemma-2b