mbart-large-50 / README.md

update model card README.md

af65b3b almost 2 years ago

4.2 kB

	---
	license: mit
	tags:
	- simplification
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: mbart-large-50-clara-med
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mbart-large-50-clara-med

	This model is a fine-tuned version of [facebook/mbart-large-50](https://huggingface.co/facebook/mbart-large-50) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.0952
	- Rouge1: 49.4298
	- Rouge2: 31.7193
	- Rougel: 44.732
	- Rougelsum: 44.9281

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5.6e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 30

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------:\|:-------:\|:-------:\|:---------:\|
	\| No log \| 1.0 \| 190 \| 9.5151 \| 8.9002 \| 0.0056 \| 8.9059 \| 8.8991 \|
	\| No log \| 2.0 \| 380 \| 1.7786 \| 44.8765 \| 27.9652 \| 40.2081 \| 40.3457 \|
	\| 4.488 \| 3.0 \| 570 \| 1.7104 \| 46.4054 \| 28.8582 \| 41.6579 \| 41.86 \|
	\| 4.488 \| 4.0 \| 760 \| 1.7601 \| 47.6046 \| 30.1854 \| 42.9171 \| 43.0745 \|
	\| 1.1057 \| 5.0 \| 950 \| 1.9232 \| 48.1693 \| 30.1535 \| 43.0418 \| 43.1796 \|
	\| 1.1057 \| 6.0 \| 1140 \| 2.2791 \| 43.831 \| 26.9216 \| 39.1768 \| 39.3672 \|
	\| 1.1057 \| 7.0 \| 1330 \| 2.4800 \| 42.4614 \| 25.2371 \| 37.6735 \| 37.9309 \|
	\| 0.4401 \| 8.0 \| 1520 \| 2.4991 \| 46.6653 \| 28.9836 \| 42.1188 \| 42.2492 \|
	\| 0.4401 \| 9.0 \| 1710 \| 2.5826 \| 47.2784 \| 29.8703 \| 42.622 \| 42.7514 \|
	\| 0.2523 \| 10.0 \| 1900 \| 2.6356 \| 48.0382 \| 30.8884 \| 43.3523 \| 43.5068 \|
	\| 0.2523 \| 11.0 \| 2090 \| 2.6141 \| 47.6911 \| 29.3254 \| 42.4938 \| 42.6519 \|
	\| 0.2523 \| 12.0 \| 2280 \| 2.6942 \| 48.7597 \| 30.9279 \| 43.5391 \| 43.6974 \|
	\| 0.1613 \| 13.0 \| 2470 \| 2.7194 \| 49.0916 \| 30.9767 \| 43.9943 \| 44.1572 \|
	\| 0.1613 \| 14.0 \| 2660 \| 2.7911 \| 47.8223 \| 30.6173 \| 43.1809 \| 43.3471 \|
	\| 0.1305 \| 15.0 \| 2850 \| 2.8370 \| 47.5629 \| 29.7783 \| 42.7168 \| 42.8503 \|
	\| 0.1305 \| 16.0 \| 3040 \| 2.8588 \| 49.4762 \| 31.6101 \| 44.5422 \| 44.7027 \|
	\| 0.1305 \| 17.0 \| 3230 \| 2.9082 \| 49.1502 \| 31.4654 \| 44.2166 \| 44.3186 \|
	\| 0.141 \| 18.0 \| 3420 \| 2.8887 \| 48.9675 \| 31.0485 \| 44.177 \| 44.3258 \|
	\| 0.141 \| 19.0 \| 3610 \| 2.9043 \| 49.2936 \| 31.5204 \| 44.2215 \| 44.4216 \|
	\| 0.1096 \| 20.0 \| 3800 \| 2.9549 \| 48.0316 \| 30.4505 \| 42.9444 \| 43.0893 \|
	\| 0.1096 \| 21.0 \| 3990 \| 2.9666 \| 49.2276 \| 31.2755 \| 44.2435 \| 44.4591 \|
	\| 0.1096 \| 22.0 \| 4180 \| 2.9697 \| 49.1008 \| 31.4931 \| 44.1893 \| 44.382 \|
	\| 0.0773 \| 23.0 \| 4370 \| 2.9970 \| 49.3707 \| 31.4672 \| 44.6066 \| 44.7685 \|
	\| 0.0773 \| 24.0 \| 4560 \| 3.0081 \| 49.2172 \| 31.4693 \| 44.4235 \| 44.5458 \|
	\| 0.048 \| 25.0 \| 4750 \| 2.9968 \| 49.4847 \| 31.8341 \| 44.8464 \| 45.0286 \|
	\| 0.048 \| 26.0 \| 4940 \| 3.0405 \| 49.5724 \| 31.612 \| 44.5192 \| 44.7717 \|
	\| 0.048 \| 27.0 \| 5130 \| 3.0651 \| 49.0194 \| 31.2473 \| 44.177 \| 44.3837 \|
	\| 0.0274 \| 28.0 \| 5320 \| 3.0740 \| 49.2999 \| 31.5672 \| 44.56 \| 44.8105 \|
	\| 0.0274 \| 29.0 \| 5510 \| 3.0842 \| 49.2898 \| 31.602 \| 44.5414 \| 44.754 \|
	\| 0.0168 \| 30.0 \| 5700 \| 3.0952 \| 49.4298 \| 31.7193 \| 44.732 \| 44.9281 \|


	### Framework versions

	- Transformers 4.25.1
	- Pytorch 1.13.0
	- Datasets 2.8.0
	- Tokenizers 0.12.1