raquelclemente
/

mt5-teste-full-length

Text2Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

mt5-teste-full-length / README.md

raquelclemente's picture

update model card README.md

92a1739 over 1 year ago

|

3.62 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: mt5-teste-full-length
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mt5-teste-full-length

	This model is a fine-tuned version of [google/mt5-base](https://huggingface.co/google/mt5-base) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5750
	- Rouge1: 0.4784
	- Rouge2: 0.3008
	- Rougel: 0.4185
	- Rougelsum: 0.4212

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 90
	- num_epochs: 4

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:------:\|:------:\|:---------:\|
	\| 2.9357 \| 0.16 \| 100 \| 2.5583 \| 0.2654 \| 0.0431 \| 0.1946 \| 0.1951 \|
	\| 1.9974 \| 0.33 \| 200 \| 1.7104 \| 0.1803 \| 0.0817 \| 0.1712 \| 0.1726 \|
	\| 1.4803 \| 0.49 \| 300 \| 1.4404 \| 0.1770 \| 0.0695 \| 0.1707 \| 0.1727 \|
	\| 1.2432 \| 0.65 \| 400 \| 1.0519 \| 0.2809 \| 0.1314 \| 0.2509 \| 0.2511 \|
	\| 0.8186 \| 0.82 \| 500 \| 0.7386 \| 0.3487 \| 0.1767 \| 0.2894 \| 0.2903 \|
	\| 0.791 \| 0.98 \| 600 \| 0.7135 \| 0.3634 \| 0.1912 \| 0.3108 \| 0.3108 \|
	\| 0.6697 \| 1.15 \| 700 \| 0.6835 \| 0.3874 \| 0.1900 \| 0.3123 \| 0.3131 \|
	\| 0.7146 \| 1.31 \| 800 \| 0.6657 \| 0.3816 \| 0.2209 \| 0.3414 \| 0.3428 \|
	\| 0.6957 \| 1.47 \| 900 \| 0.6498 \| 0.3878 \| 0.2045 \| 0.3336 \| 0.3339 \|
	\| 0.6737 \| 1.64 \| 1000 \| 0.6332 \| 0.4094 \| 0.2219 \| 0.3524 \| 0.3535 \|
	\| 0.6537 \| 1.8 \| 1100 \| 0.6369 \| 0.4401 \| 0.2621 \| 0.3629 \| 0.3630 \|
	\| 0.6746 \| 1.96 \| 1200 \| 0.6169 \| 0.4369 \| 0.2326 \| 0.3566 \| 0.3574 \|
	\| 0.5961 \| 2.13 \| 1300 \| 0.6171 \| 0.4364 \| 0.2464 \| 0.3666 \| 0.3670 \|
	\| 0.5829 \| 2.29 \| 1400 \| 0.6122 \| 0.4539 \| 0.2683 \| 0.3813 \| 0.3825 \|
	\| 0.6336 \| 2.45 \| 1500 \| 0.5993 \| 0.4347 \| 0.2548 \| 0.3660 \| 0.3689 \|
	\| 0.5754 \| 2.62 \| 1600 \| 0.5905 \| 0.4575 \| 0.2789 \| 0.3856 \| 0.3857 \|
	\| 0.5984 \| 2.78 \| 1700 \| 0.5872 \| 0.4630 \| 0.2768 \| 0.3915 \| 0.3929 \|
	\| 0.5966 \| 2.95 \| 1800 \| 0.5944 \| 0.4605 \| 0.2753 \| 0.3822 \| 0.3828 \|
	\| 0.5288 \| 3.11 \| 1900 \| 0.5955 \| 0.4520 \| 0.2651 \| 0.3874 \| 0.3887 \|
	\| 0.5316 \| 3.27 \| 2000 \| 0.5841 \| 0.4649 \| 0.2820 \| 0.4052 \| 0.4056 \|
	\| 0.5332 \| 3.44 \| 2100 \| 0.5765 \| 0.4861 \| 0.3046 \| 0.4021 \| 0.4050 \|
	\| 0.5296 \| 3.6 \| 2200 \| 0.5812 \| 0.4610 \| 0.2815 \| 0.3976 \| 0.4021 \|
	\| 0.5215 \| 3.76 \| 2300 \| 0.5757 \| 0.4724 \| 0.2947 \| 0.4122 \| 0.4164 \|
	\| 0.5399 \| 3.93 \| 2400 \| 0.5750 \| 0.4784 \| 0.3008 \| 0.4185 \| 0.4212 \|


	### Framework versions

	- Transformers 4.27.4
	- Pytorch 1.13.0
	- Datasets 2.1.0
	- Tokenizers 0.13.2