jeremyvictor
/

mt5-large-gramatika161k-b16-e10-lr5

Text2Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

mt5-large-gramatika161k-b16-e10-lr5 / README.md

jeremyvictor's picture

Model save

90eae41 over 1 year ago

|

2.98 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: mt5-large-gramatika161k-b16-e10-lr5
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mt5-large-gramatika161k-b16-e10-lr5

	This model is a fine-tuned version of [google/mt5-large](https://huggingface.co/google/mt5-large) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0909
	- Rouge1: 72.6295
	- Rouge2: 67.8521
	- Rougel: 72.5471
	- Rougelsum: 72.5591
	- Gen Len: 18.3276

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adafactor
	- lr_scheduler_type: linear
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Gen Len \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:-------:\|:-------:\|:-------:\|:---------:\|:-------:\|
	\| 0.9659 \| 0.63 \| 5000 \| 0.1455 \| 70.1028 \| 63.4969 \| 69.9738 \| 69.9761 \| 18.3378 \|
	\| 0.1735 \| 1.27 \| 10000 \| 0.1195 \| 71.1156 \| 65.2149 \| 70.9932 \| 71.0038 \| 18.3324 \|
	\| 0.1391 \| 1.9 \| 15000 \| 0.1076 \| 71.5692 \| 66.0226 \| 71.4676 \| 71.472 \| 18.3281 \|
	\| 0.1149 \| 2.54 \| 20000 \| 0.1035 \| 71.8135 \| 66.4584 \| 71.7212 \| 71.7292 \| 18.3308 \|
	\| 0.1029 \| 3.17 \| 25000 \| 0.0961 \| 72.104 \| 66.9459 \| 72.0139 \| 72.0239 \| 18.3282 \|
	\| 0.0898 \| 3.81 \| 30000 \| 0.0944 \| 72.231 \| 67.1623 \| 72.1412 \| 72.1542 \| 18.3314 \|
	\| 0.0803 \| 4.44 \| 35000 \| 0.0926 \| 72.3851 \| 67.4624 \| 72.3051 \| 72.3183 \| 18.3286 \|
	\| 0.075 \| 5.08 \| 40000 \| 0.0929 \| 72.4219 \| 67.5102 \| 72.3376 \| 72.3479 \| 18.3298 \|
	\| 0.0665 \| 5.71 \| 45000 \| 0.0917 \| 72.5132 \| 67.6501 \| 72.4271 \| 72.4383 \| 18.3264 \|
	\| 0.0624 \| 6.35 \| 50000 \| 0.0911 \| 72.5711 \| 67.771 \| 72.4938 \| 72.5041 \| 18.3283 \|
	\| 0.0588 \| 6.98 \| 55000 \| 0.0909 \| 72.6295 \| 67.8521 \| 72.5471 \| 72.5591 \| 18.3276 \|
	\| 0.0534 \| 7.62 \| 60000 \| 0.0920 \| 72.6475 \| 67.9046 \| 72.5743 \| 72.5853 \| 18.3278 \|
	\| 0.0514 \| 8.25 \| 65000 \| 0.0930 \| 72.6373 \| 67.894 \| 72.5612 \| 72.5724 \| 18.3277 \|
	\| 0.0492 \| 8.88 \| 70000 \| 0.0930 \| 72.6593 \| 67.9359 \| 72.59 \| 72.5971 \| 18.3273 \|
	\| 0.047 \| 9.52 \| 75000 \| 0.0932 \| 72.6906 \| 68.01 \| 72.6172 \| 72.6269 \| 18.3264 \|


	### Framework versions

	- Transformers 4.30.1
	- Pytorch 1.11.0a0+b6df043
	- Datasets 2.12.0
	- Tokenizers 0.13.3