raquelclemente
/

mt5-teste-full-length

Text2Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

mt5-teste-full-length / README.md

raquelclemente's picture

update model card README.md

e047e1f over 1 year ago

|

3.62 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: mt5-teste-full-length
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mt5-teste-full-length

	This model is a fine-tuned version of [google/mt5-base](https://huggingface.co/google/mt5-base) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5996
	- Rouge1: 0.5083
	- Rouge2: 0.2820
	- Rougel: 0.4095
	- Rougelsum: 0.4108

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 90
	- num_epochs: 4

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:------:\|:------:\|:---------:\|
	\| 9.1442 \| 0.16 \| 100 \| 9.7852 \| 0.0531 \| 0.0 \| 0.0524 \| 0.0 \|
	\| 1.0643 \| 0.33 \| 200 \| 0.9089 \| 0.3623 \| 0.1853 \| 0.3252 \| 0.3261 \|
	\| 0.8283 \| 0.49 \| 300 \| 0.8361 \| 0.4184 \| 0.2112 \| 0.3535 \| 0.3548 \|
	\| 0.7754 \| 0.65 \| 400 \| 0.7522 \| 0.4407 \| 0.2575 \| 0.3802 \| 0.3828 \|
	\| 0.8012 \| 0.82 \| 500 \| 0.7226 \| 0.4643 \| 0.2638 \| 0.3866 \| 0.3866 \|
	\| 0.7758 \| 0.98 \| 600 \| 0.7265 \| 0.4624 \| 0.2458 \| 0.3840 \| 0.3847 \|
	\| 0.6744 \| 1.15 \| 700 \| 0.7018 \| 0.4477 \| 0.2469 \| 0.3732 \| 0.3741 \|
	\| 0.6636 \| 1.31 \| 800 \| 0.6955 \| 0.4786 \| 0.2632 \| 0.4027 \| 0.4038 \|
	\| 0.6839 \| 1.47 \| 900 \| 0.6737 \| 0.4773 \| 0.2689 \| 0.3909 \| 0.3898 \|
	\| 0.6264 \| 1.64 \| 1000 \| 0.6504 \| 0.4457 \| 0.2533 \| 0.3747 \| 0.3767 \|
	\| 0.6641 \| 1.8 \| 1100 \| 0.6442 \| 0.4582 \| 0.2428 \| 0.3661 \| 0.3659 \|
	\| 0.6492 \| 1.96 \| 1200 \| 0.6500 \| 0.5004 \| 0.2751 \| 0.3984 \| 0.3993 \|
	\| 0.5823 \| 2.13 \| 1300 \| 0.6344 \| 0.4917 \| 0.2743 \| 0.4000 \| 0.4016 \|
	\| 0.5585 \| 2.29 \| 1400 \| 0.6373 \| 0.4749 \| 0.2490 \| 0.3834 \| 0.3849 \|
	\| 0.5748 \| 2.45 \| 1500 \| 0.6168 \| 0.5036 \| 0.2915 \| 0.4128 \| 0.4145 \|
	\| 0.5452 \| 2.62 \| 1600 \| 0.6135 \| 0.5004 \| 0.2864 \| 0.4038 \| 0.4044 \|
	\| 0.5735 \| 2.78 \| 1700 \| 0.6164 \| 0.4904 \| 0.2689 \| 0.4001 \| 0.3993 \|
	\| 0.5394 \| 2.95 \| 1800 \| 0.6153 \| 0.4864 \| 0.2884 \| 0.4091 \| 0.4089 \|
	\| 0.4816 \| 3.11 \| 1900 \| 0.6070 \| 0.5027 \| 0.2765 \| 0.4042 \| 0.4031 \|
	\| 0.5328 \| 3.27 \| 2000 \| 0.6095 \| 0.4896 \| 0.2783 \| 0.4026 \| 0.4031 \|
	\| 0.5157 \| 3.44 \| 2100 \| 0.6021 \| 0.5165 \| 0.2853 \| 0.4137 \| 0.4145 \|
	\| 0.5295 \| 3.6 \| 2200 \| 0.6063 \| 0.4926 \| 0.2721 \| 0.3965 \| 0.3980 \|
	\| 0.5027 \| 3.76 \| 2300 \| 0.6004 \| 0.5120 \| 0.2885 \| 0.4092 \| 0.4103 \|
	\| 0.4943 \| 3.93 \| 2400 \| 0.5996 \| 0.5083 \| 0.2820 \| 0.4095 \| 0.4108 \|


	### Framework versions

	- Transformers 4.27.4
	- Pytorch 1.13.0
	- Datasets 2.1.0
	- Tokenizers 0.13.2