zera09
/

long_t5

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

long_t5 / README.md

zera09's picture

End of training

402750e verified 2 months ago

|

history blame contribute delete

3.47 kB

	---
	license: apache-2.0
	base_model: google/long-t5-tglobal-base
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: long_t5
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# long_t5

	This model is a fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.5158
	- Rouge1: 0.5214
	- Rouge2: 0.3347
	- Rougel: 0.4751
	- Rougelsum: 0.4746
	- Gen Len: 25.9513

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Gen Len \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:------:\|:------:\|:------:\|:---------:\|:-------:\|
	\| 2.232 \| 1.0 \| 1600 \| 1.6810 \| 0.4704 \| 0.2861 \| 0.4256 \| 0.4251 \| 26.6112 \|
	\| 2.0229 \| 2.0 \| 3200 \| 1.6167 \| 0.4859 \| 0.2991 \| 0.4412 \| 0.4407 \| 26.1006 \|
	\| 1.9239 \| 3.0 \| 4800 \| 1.5805 \| 0.4924 \| 0.3049 \| 0.4475 \| 0.4468 \| 26.8169 \|
	\| 1.8454 \| 4.0 \| 6400 \| 1.5669 \| 0.4968 \| 0.3093 \| 0.4517 \| 0.4511 \| 25.925 \|
	\| 1.7626 \| 5.0 \| 8000 \| 1.5432 \| 0.4973 \| 0.3132 \| 0.453 \| 0.4525 \| 26.4362 \|
	\| 1.6995 \| 6.0 \| 9600 \| 1.5352 \| 0.5045 \| 0.3188 \| 0.4596 \| 0.459 \| 26.1219 \|
	\| 1.682 \| 7.0 \| 11200 \| 1.5255 \| 0.5066 \| 0.3198 \| 0.4613 \| 0.4609 \| 26.1581 \|
	\| 1.6286 \| 8.0 \| 12800 \| 1.5210 \| 0.5113 \| 0.3245 \| 0.4663 \| 0.466 \| 26.1725 \|
	\| 1.593 \| 9.0 \| 14400 \| 1.5195 \| 0.5102 \| 0.3235 \| 0.464 \| 0.4638 \| 25.8944 \|
	\| 1.5784 \| 10.0 \| 16000 \| 1.5166 \| 0.5133 \| 0.3265 \| 0.4665 \| 0.4661 \| 25.685 \|
	\| 1.5615 \| 11.0 \| 17600 \| 1.5135 \| 0.5161 \| 0.3284 \| 0.47 \| 0.4695 \| 25.8681 \|
	\| 1.5391 \| 12.0 \| 19200 \| 1.5106 \| 0.5156 \| 0.3303 \| 0.4703 \| 0.4701 \| 26.1781 \|
	\| 1.5077 \| 13.0 \| 20800 \| 1.5095 \| 0.5177 \| 0.3317 \| 0.4724 \| 0.4721 \| 26.0456 \|
	\| 1.4923 \| 14.0 \| 22400 \| 1.5163 \| 0.5185 \| 0.3321 \| 0.4728 \| 0.4723 \| 26.17 \|
	\| 1.4545 \| 15.0 \| 24000 \| 1.5128 \| 0.5181 \| 0.3337 \| 0.4727 \| 0.4724 \| 25.8219 \|
	\| 1.4489 \| 16.0 \| 25600 \| 1.5135 \| 0.5209 \| 0.3349 \| 0.4744 \| 0.4743 \| 26.0369 \|
	\| 1.4481 \| 17.0 \| 27200 \| 1.5153 \| 0.5218 \| 0.3349 \| 0.4751 \| 0.4748 \| 26.1744 \|
	\| 1.4287 \| 18.0 \| 28800 \| 1.5134 \| 0.521 \| 0.335 \| 0.4752 \| 0.4747 \| 25.9525 \|
	\| 1.389 \| 19.0 \| 30400 \| 1.5155 \| 0.5212 \| 0.3348 \| 0.4756 \| 0.4751 \| 26.0369 \|
	\| 1.4215 \| 20.0 \| 32000 \| 1.5158 \| 0.5214 \| 0.3347 \| 0.4751 \| 0.4746 \| 25.9513 \|


	### Framework versions

	- Transformers 4.41.2
	- Pytorch 2.3.1+cu118
	- Datasets 2.20.0
	- Tokenizers 0.19.1