seq2seq_huggingface_mix_results / README.md

Initial model training

fbca165 verified 9 months ago

4.54 kB

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: seq2seq_huggingface_mix_results
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# seq2seq_huggingface_mix_results

	This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 7.0175

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 12
	- eval_batch_size: 12
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 48
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 3
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 10.5072 \| 0.0480 \| 10 \| 10.4491 \|
	\| 10.3574 \| 0.0959 \| 20 \| 10.1991 \|
	\| 10.0831 \| 0.1439 \| 30 \| 9.8790 \|
	\| 9.7946 \| 0.1918 \| 40 \| 9.5780 \|
	\| 9.5118 \| 0.2398 \| 50 \| 9.3344 \|
	\| 9.3333 \| 0.2878 \| 60 \| 9.1722 \|
	\| 9.1888 \| 0.3357 \| 70 \| 9.0610 \|
	\| 9.0913 \| 0.3837 \| 80 \| 8.9742 \|
	\| 9.0007 \| 0.4317 \| 90 \| 8.9005 \|
	\| 8.9134 \| 0.4796 \| 100 \| 8.8328 \|
	\| 8.8583 \| 0.5276 \| 110 \| 8.7615 \|
	\| 8.7722 \| 0.5755 \| 120 \| 8.6873 \|
	\| 8.7092 \| 0.6235 \| 130 \| 8.6137 \|
	\| 8.6223 \| 0.6715 \| 140 \| 8.5340 \|
	\| 8.5312 \| 0.7194 \| 150 \| 8.4538 \|
	\| 8.4582 \| 0.7674 \| 160 \| 8.3681 \|
	\| 8.3748 \| 0.8153 \| 170 \| 8.2801 \|
	\| 8.2637 \| 0.8633 \| 180 \| 8.1936 \|
	\| 8.1704 \| 0.9113 \| 190 \| 8.1001 \|
	\| 8.0697 \| 0.9592 \| 200 \| 8.0079 \|
	\| 7.9792 \| 1.0072 \| 210 \| 7.9126 \|
	\| 7.9 \| 1.0552 \| 220 \| 7.8175 \|
	\| 7.8134 \| 1.1031 \| 230 \| 7.7236 \|
	\| 7.7153 \| 1.1511 \| 240 \| 7.6328 \|
	\| 7.6087 \| 1.1990 \| 250 \| 7.5477 \|
	\| 7.5328 \| 1.2470 \| 260 \| 7.4634 \|
	\| 7.4347 \| 1.2950 \| 270 \| 7.3862 \|
	\| 7.3531 \| 1.3429 \| 280 \| 7.3179 \|
	\| 7.3059 \| 1.3909 \| 290 \| 7.2513 \|
	\| 7.2403 \| 1.4388 \| 300 \| 7.1955 \|
	\| 7.2128 \| 1.4868 \| 310 \| 7.1506 \|
	\| 7.1508 \| 1.5348 \| 320 \| 7.1105 \|
	\| 7.1104 \| 1.5827 \| 330 \| 7.0835 \|
	\| 7.067 \| 1.6307 \| 340 \| 7.0655 \|
	\| 7.0594 \| 1.6787 \| 350 \| 7.0558 \|
	\| 7.0591 \| 1.7266 \| 360 \| 7.0411 \|
	\| 7.0129 \| 1.7746 \| 370 \| 7.0381 \|
	\| 7.0107 \| 1.8225 \| 380 \| 7.0344 \|
	\| 7.0549 \| 1.8705 \| 390 \| 7.0268 \|
	\| 7.0358 \| 1.9185 \| 400 \| 7.0249 \|
	\| 7.0395 \| 1.9664 \| 410 \| 7.0242 \|
	\| 7.0105 \| 2.0144 \| 420 \| 7.0215 \|
	\| 7.0113 \| 2.0624 \| 430 \| 7.0259 \|
	\| 6.9985 \| 2.1103 \| 440 \| 7.0213 \|
	\| 7.0218 \| 2.1583 \| 450 \| 7.0218 \|
	\| 6.9735 \| 2.2062 \| 460 \| 7.0275 \|
	\| 7.0132 \| 2.2542 \| 470 \| 7.0254 \|
	\| 7.0241 \| 2.3022 \| 480 \| 7.0219 \|
	\| 7.0127 \| 2.3501 \| 490 \| 7.0238 \|
	\| 6.9644 \| 2.3981 \| 500 \| 7.0249 \|
	\| 7.0103 \| 2.4460 \| 510 \| 7.0259 \|
	\| 7.006 \| 2.4940 \| 520 \| 7.0266 \|
	\| 6.9882 \| 2.5420 \| 530 \| 7.0235 \|
	\| 7.0016 \| 2.5899 \| 540 \| 7.0235 \|
	\| 7.002 \| 2.6379 \| 550 \| 7.0217 \|
	\| 6.9782 \| 2.6859 \| 560 \| 7.0196 \|
	\| 6.9833 \| 2.7338 \| 570 \| 7.0198 \|
	\| 6.9967 \| 2.7818 \| 580 \| 7.0202 \|
	\| 6.9644 \| 2.8297 \| 590 \| 7.0196 \|
	\| 6.9825 \| 2.8777 \| 600 \| 7.0199 \|
	\| 7.0097 \| 2.9257 \| 610 \| 7.0178 \|
	\| 6.9909 \| 2.9736 \| 620 \| 7.0175 \|


	### Framework versions

	- Transformers 4.40.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: seq2seq_huggingface_mix_results
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# seq2seq_huggingface_mix_results

	This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 7.0175

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 12
	- eval_batch_size: 12
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 48
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 3
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 10.5072 \| 0.0480 \| 10 \| 10.4491 \|
	\| 10.3574 \| 0.0959 \| 20 \| 10.1991 \|
	\| 10.0831 \| 0.1439 \| 30 \| 9.8790 \|
	\| 9.7946 \| 0.1918 \| 40 \| 9.5780 \|
	\| 9.5118 \| 0.2398 \| 50 \| 9.3344 \|
	\| 9.3333 \| 0.2878 \| 60 \| 9.1722 \|
	\| 9.1888 \| 0.3357 \| 70 \| 9.0610 \|
	\| 9.0913 \| 0.3837 \| 80 \| 8.9742 \|
	\| 9.0007 \| 0.4317 \| 90 \| 8.9005 \|
	\| 8.9134 \| 0.4796 \| 100 \| 8.8328 \|
	\| 8.8583 \| 0.5276 \| 110 \| 8.7615 \|
	\| 8.7722 \| 0.5755 \| 120 \| 8.6873 \|
	\| 8.7092 \| 0.6235 \| 130 \| 8.6137 \|
	\| 8.6223 \| 0.6715 \| 140 \| 8.5340 \|
	\| 8.5312 \| 0.7194 \| 150 \| 8.4538 \|
	\| 8.4582 \| 0.7674 \| 160 \| 8.3681 \|
	\| 8.3748 \| 0.8153 \| 170 \| 8.2801 \|
	\| 8.2637 \| 0.8633 \| 180 \| 8.1936 \|
	\| 8.1704 \| 0.9113 \| 190 \| 8.1001 \|
	\| 8.0697 \| 0.9592 \| 200 \| 8.0079 \|
	\| 7.9792 \| 1.0072 \| 210 \| 7.9126 \|
	\| 7.9 \| 1.0552 \| 220 \| 7.8175 \|
	\| 7.8134 \| 1.1031 \| 230 \| 7.7236 \|
	\| 7.7153 \| 1.1511 \| 240 \| 7.6328 \|
	\| 7.6087 \| 1.1990 \| 250 \| 7.5477 \|
	\| 7.5328 \| 1.2470 \| 260 \| 7.4634 \|
	\| 7.4347 \| 1.2950 \| 270 \| 7.3862 \|
	\| 7.3531 \| 1.3429 \| 280 \| 7.3179 \|
	\| 7.3059 \| 1.3909 \| 290 \| 7.2513 \|
	\| 7.2403 \| 1.4388 \| 300 \| 7.1955 \|
	\| 7.2128 \| 1.4868 \| 310 \| 7.1506 \|
	\| 7.1508 \| 1.5348 \| 320 \| 7.1105 \|
	\| 7.1104 \| 1.5827 \| 330 \| 7.0835 \|
	\| 7.067 \| 1.6307 \| 340 \| 7.0655 \|
	\| 7.0594 \| 1.6787 \| 350 \| 7.0558 \|
	\| 7.0591 \| 1.7266 \| 360 \| 7.0411 \|
	\| 7.0129 \| 1.7746 \| 370 \| 7.0381 \|
	\| 7.0107 \| 1.8225 \| 380 \| 7.0344 \|
	\| 7.0549 \| 1.8705 \| 390 \| 7.0268 \|
	\| 7.0358 \| 1.9185 \| 400 \| 7.0249 \|
	\| 7.0395 \| 1.9664 \| 410 \| 7.0242 \|
	\| 7.0105 \| 2.0144 \| 420 \| 7.0215 \|
	\| 7.0113 \| 2.0624 \| 430 \| 7.0259 \|
	\| 6.9985 \| 2.1103 \| 440 \| 7.0213 \|
	\| 7.0218 \| 2.1583 \| 450 \| 7.0218 \|
	\| 6.9735 \| 2.2062 \| 460 \| 7.0275 \|
	\| 7.0132 \| 2.2542 \| 470 \| 7.0254 \|
	\| 7.0241 \| 2.3022 \| 480 \| 7.0219 \|
	\| 7.0127 \| 2.3501 \| 490 \| 7.0238 \|
	\| 6.9644 \| 2.3981 \| 500 \| 7.0249 \|
	\| 7.0103 \| 2.4460 \| 510 \| 7.0259 \|
	\| 7.006 \| 2.4940 \| 520 \| 7.0266 \|
	\| 6.9882 \| 2.5420 \| 530 \| 7.0235 \|
	\| 7.0016 \| 2.5899 \| 540 \| 7.0235 \|
	\| 7.002 \| 2.6379 \| 550 \| 7.0217 \|
	\| 6.9782 \| 2.6859 \| 560 \| 7.0196 \|
	\| 6.9833 \| 2.7338 \| 570 \| 7.0198 \|
	\| 6.9967 \| 2.7818 \| 580 \| 7.0202 \|
	\| 6.9644 \| 2.8297 \| 590 \| 7.0196 \|
	\| 6.9825 \| 2.8777 \| 600 \| 7.0199 \|
	\| 7.0097 \| 2.9257 \| 610 \| 7.0178 \|
	\| 6.9909 \| 2.9736 \| 620 \| 7.0175 \|


	### Framework versions

	- Transformers 4.40.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1