130000 / README.md

End of training

89024b9 verified 8 months ago

3.67 kB

	---
	license: mit
	base_model: gpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: '130000'
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# 130000

	This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 5.9987

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| No log \| 0.92 \| 3 \| 7.0396 \|
	\| No log \| 1.85 \| 6 \| 6.5398 \|
	\| No log \| 2.77 \| 9 \| 6.3337 \|
	\| 6.6916 \| 4.0 \| 13 \| 6.3694 \|
	\| 6.6916 \| 4.92 \| 16 \| 6.2945 \|
	\| 6.6916 \| 5.85 \| 19 \| 6.3184 \|
	\| 6.1092 \| 6.77 \| 22 \| 6.3726 \|
	\| 6.1092 \| 8.0 \| 26 \| 6.2948 \|
	\| 6.1092 \| 8.92 \| 29 \| 6.3374 \|
	\| 6.5151 \| 9.85 \| 32 \| 6.3641 \|
	\| 6.5151 \| 10.77 \| 35 \| 6.2335 \|
	\| 6.5151 \| 12.0 \| 39 \| 6.1965 \|
	\| 5.998 \| 12.92 \| 42 \| 6.0595 \|
	\| 5.998 \| 13.85 \| 45 \| 6.0374 \|
	\| 5.998 \| 14.77 \| 48 \| 6.0562 \|
	\| 5.6623 \| 16.0 \| 52 \| 6.0128 \|
	\| 5.6623 \| 16.92 \| 55 \| 5.9999 \|
	\| 5.6623 \| 17.85 \| 58 \| 6.0008 \|
	\| 5.611 \| 18.77 \| 61 \| 5.9992 \|
	\| 5.611 \| 20.0 \| 65 \| 6.0017 \|
	\| 5.611 \| 20.92 \| 68 \| 6.0005 \|
	\| 5.5519 \| 21.85 \| 71 \| 5.9962 \|
	\| 5.5519 \| 22.77 \| 74 \| 5.9964 \|
	\| 5.5519 \| 24.0 \| 78 \| 5.9975 \|
	\| 5.5841 \| 24.92 \| 81 \| 5.9974 \|
	\| 5.5841 \| 25.85 \| 84 \| 6.0000 \|
	\| 5.5841 \| 26.77 \| 87 \| 6.0019 \|
	\| 5.5582 \| 28.0 \| 91 \| 6.0014 \|
	\| 5.5582 \| 28.92 \| 94 \| 6.0016 \|
	\| 5.5582 \| 29.85 \| 97 \| 5.9987 \|
	\| 5.591 \| 30.77 \| 100 \| 5.9992 \|
	\| 5.591 \| 32.0 \| 104 \| 5.9986 \|
	\| 5.591 \| 32.92 \| 107 \| 5.9982 \|
	\| 5.5638 \| 33.85 \| 110 \| 5.9983 \|
	\| 5.5638 \| 34.77 \| 113 \| 5.9987 \|
	\| 5.5638 \| 36.0 \| 117 \| 5.9989 \|
	\| 5.5683 \| 36.92 \| 120 \| 5.9992 \|
	\| 5.5683 \| 37.85 \| 123 \| 5.9995 \|
	\| 5.5683 \| 38.77 \| 126 \| 5.9991 \|
	\| 5.5628 \| 40.0 \| 130 \| 5.9992 \|
	\| 5.5628 \| 40.92 \| 133 \| 5.9992 \|
	\| 5.5628 \| 41.85 \| 136 \| 5.9991 \|
	\| 5.5628 \| 42.77 \| 139 \| 5.9989 \|
	\| 5.5683 \| 44.0 \| 143 \| 5.9987 \|
	\| 5.5683 \| 44.92 \| 146 \| 5.9987 \|
	\| 5.5683 \| 45.85 \| 149 \| 5.9987 \|
	\| 5.5534 \| 46.15 \| 150 \| 5.9987 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.2

	---
	license: mit
	base_model: gpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: '130000'
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# 130000

	This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 5.9987

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| No log \| 0.92 \| 3 \| 7.0396 \|
	\| No log \| 1.85 \| 6 \| 6.5398 \|
	\| No log \| 2.77 \| 9 \| 6.3337 \|
	\| 6.6916 \| 4.0 \| 13 \| 6.3694 \|
	\| 6.6916 \| 4.92 \| 16 \| 6.2945 \|
	\| 6.6916 \| 5.85 \| 19 \| 6.3184 \|
	\| 6.1092 \| 6.77 \| 22 \| 6.3726 \|
	\| 6.1092 \| 8.0 \| 26 \| 6.2948 \|
	\| 6.1092 \| 8.92 \| 29 \| 6.3374 \|
	\| 6.5151 \| 9.85 \| 32 \| 6.3641 \|
	\| 6.5151 \| 10.77 \| 35 \| 6.2335 \|
	\| 6.5151 \| 12.0 \| 39 \| 6.1965 \|
	\| 5.998 \| 12.92 \| 42 \| 6.0595 \|
	\| 5.998 \| 13.85 \| 45 \| 6.0374 \|
	\| 5.998 \| 14.77 \| 48 \| 6.0562 \|
	\| 5.6623 \| 16.0 \| 52 \| 6.0128 \|
	\| 5.6623 \| 16.92 \| 55 \| 5.9999 \|
	\| 5.6623 \| 17.85 \| 58 \| 6.0008 \|
	\| 5.611 \| 18.77 \| 61 \| 5.9992 \|
	\| 5.611 \| 20.0 \| 65 \| 6.0017 \|
	\| 5.611 \| 20.92 \| 68 \| 6.0005 \|
	\| 5.5519 \| 21.85 \| 71 \| 5.9962 \|
	\| 5.5519 \| 22.77 \| 74 \| 5.9964 \|
	\| 5.5519 \| 24.0 \| 78 \| 5.9975 \|
	\| 5.5841 \| 24.92 \| 81 \| 5.9974 \|
	\| 5.5841 \| 25.85 \| 84 \| 6.0000 \|
	\| 5.5841 \| 26.77 \| 87 \| 6.0019 \|
	\| 5.5582 \| 28.0 \| 91 \| 6.0014 \|
	\| 5.5582 \| 28.92 \| 94 \| 6.0016 \|
	\| 5.5582 \| 29.85 \| 97 \| 5.9987 \|
	\| 5.591 \| 30.77 \| 100 \| 5.9992 \|
	\| 5.591 \| 32.0 \| 104 \| 5.9986 \|
	\| 5.591 \| 32.92 \| 107 \| 5.9982 \|
	\| 5.5638 \| 33.85 \| 110 \| 5.9983 \|
	\| 5.5638 \| 34.77 \| 113 \| 5.9987 \|
	\| 5.5638 \| 36.0 \| 117 \| 5.9989 \|
	\| 5.5683 \| 36.92 \| 120 \| 5.9992 \|
	\| 5.5683 \| 37.85 \| 123 \| 5.9995 \|
	\| 5.5683 \| 38.77 \| 126 \| 5.9991 \|
	\| 5.5628 \| 40.0 \| 130 \| 5.9992 \|
	\| 5.5628 \| 40.92 \| 133 \| 5.9992 \|
	\| 5.5628 \| 41.85 \| 136 \| 5.9991 \|
	\| 5.5628 \| 42.77 \| 139 \| 5.9989 \|
	\| 5.5683 \| 44.0 \| 143 \| 5.9987 \|
	\| 5.5683 \| 44.92 \| 146 \| 5.9987 \|
	\| 5.5683 \| 45.85 \| 149 \| 5.9987 \|
	\| 5.5534 \| 46.15 \| 150 \| 5.9987 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.2