p1atdev
/

t5-base-xlsum-ja

text2text-generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

t5-base-xlsum-ja / README.md

p1atdev's picture

Update README.md

f852cb5 about 1 year ago

|

2.68 kB

	---
	license: cc-by-sa-4.0
	base_model: retrieva-jp/t5-base-long
	tags:
	- generated_from_trainer
	- summarization
	datasets:
	- xlsum
	metrics:
	- rouge
	model-index:
	- name: t5-base-xlsum-ja
	results:
	- task:
	name: Sequence-to-sequence Language Modeling
	type: text2text-generation
	dataset:
	name: csebuetnlp/xlsum
	type: XL-Sum
	config: japanese
	split: train
	args: japanese
	metrics:
	- name: Rouge1
	type: rouge
	value: 0.3648008957585529
	- name: Rouge2
	type: rouge
	value: 0.16411161798042992
	language:
	- ja
	library_name: transformers
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# t5-base-xlsum-ja

	This model is a fine-tuned version of [retrieva-jp/t5-base-long](https://huggingface.co/retrieva-jp/t5-base-long) on the xlsum dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.6563
	- Rouge1: 0.3648
	- Rouge2: 0.1641
	- Rougel: 0.2965
	- Rougelsum: 0.3132

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.01
	- num_epochs: 15

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:------:\|:------:\|:---------:\|
	\| 4.9166 \| 1.8 \| 100 \| 3.4095 \| 0.3569 \| 0.1509 \| 0.2416 \| 0.3209 \|
	\| 4.1162 \| 3.61 \| 200 \| 3.0980 \| 0.3262 \| 0.1354 \| 0.2557 \| 0.2805 \|
	\| 3.8578 \| 5.41 \| 300 \| 2.8853 \| 0.3428 \| 0.1445 \| 0.2628 \| 0.2881 \|
	\| 3.7309 \| 7.22 \| 400 \| 2.7714 \| 0.3621 \| 0.1615 \| 0.2951 \| 0.3151 \|
	\| 3.6716 \| 9.02 \| 500 \| 2.7042 \| 0.3727 \| 0.1668 \| 0.2982 \| 0.3225 \|
	\| 3.6393 \| 10.82 \| 600 \| 2.6666 \| 0.3676 \| 0.1592 \| 0.2987 \| 0.3206 \|
	\| 3.6291 \| 12.63 \| 700 \| 2.6587 \| 0.3654 \| 0.1576 \| 0.2955 \| 0.3108 \|
	\| 3.6224 \| 14.43 \| 800 \| 2.6563 \| 0.3648 \| 0.1641 \| 0.2965 \| 0.3132 \|


	### Framework versions

	- Transformers 4.34.0
	- Pytorch 2.0.0+cu118
	- Datasets 2.14.5
	- Tokenizers 0.14.0