psxjp5
/

mt5-small_large_lr

Text2Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

mt5-small_large_lr / README.md

psxjp5's picture

update model card README.md

9dcae90 11 months ago

|

raw history blame contribute delete

No virus

2.95 kB

	---
	license: apache-2.0
	base_model: google/mt5-small
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	- bleu
	model-index:
	- name: mt5-small_large_lr
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mt5-small_large_lr

	This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.9688
	- Rouge1: 38.8633
	- Rouge2: 33.0802
	- Rougel: 37.6956
	- Rougelsum: 37.7116
	- Bleu: 26.6301
	- Gen Len: 11.5566
	- Meteor: 0.3519
	- No ans accuracy: 22.99
	- Av cosine sim: 0.6861

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.005
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 9
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Bleu \| Gen Len \| Meteor \| No ans accuracy \| Av cosine sim \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------:\|:-------:\|:-------:\|:---------:\|:-------:\|:-------:\|:------:\|:---------------:\|:-------------:\|
	\| 5.4434 \| 1.0 \| 175 \| 2.1918 \| 1.8449 \| 1.2024 \| 1.7039 \| 1.7116 \| 0.0 \| 2.7672 \| 0.0145 \| 28.9700 \| 0.1363 \|
	\| 1.8436 \| 1.99 \| 350 \| 1.1852 \| 33.6062 \| 26.8725 \| 32.2258 \| 32.241 \| 20.3395 \| 12.2528 \| 0.2957 \| 17.3800 \| 0.636 \|
	\| 1.2276 \| 2.99 \| 525 \| 1.0630 \| 33.186 \| 27.4949 \| 32.0715 \| 32.0522 \| 20.3232 \| 11.0301 \| 0.2957 \| 21.18 \| 0.6109 \|
	\| 0.9589 \| 3.98 \| 700 \| 1.0083 \| 40.265 \| 33.6652 \| 38.9503 \| 38.9661 \| 28.0884 \| 12.8545 \| 0.3623 \| 17.54 \| 0.7157 \|
	\| 0.7931 \| 4.98 \| 875 \| 0.9682 \| 37.9437 \| 31.7611 \| 36.7618 \| 36.7671 \| 25.7738 \| 12.0286 \| 0.3424 \| 20.66 \| 0.6825 \|
	\| 0.6686 \| 5.97 \| 1050 \| 0.9601 \| 37.5742 \| 31.9098 \| 36.4225 \| 36.4381 \| 24.9584 \| 11.4169 \| 0.3398 \| 22.56 \| 0.6713 \|
	\| 0.5686 \| 6.97 \| 1225 \| 0.9620 \| 43.1436 \| 36.6363 \| 41.7279 \| 41.7571 \| 32.4301 \| 13.6142 \| 0.3893 \| 16.9400 \| 0.757 \|
	\| 0.4939 \| 7.96 \| 1400 \| 0.9688 \| 38.8633 \| 33.0802 \| 37.6956 \| 37.7116 \| 26.6301 \| 11.5566 \| 0.3519 \| 22.99 \| 0.6861 \|


	### Framework versions

	- Transformers 4.31.0
	- Pytorch 2.0.1+cu118
	- Datasets 2.13.1
	- Tokenizers 0.13.3