add model

b645e2e almost 2 years ago

No virus

3.63 kB

	---
	tags:
	- generated_from_keras_callback
	model-index:
	- name: distilgpt_new_0060
	results: []
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# distilgpt_new_0060

	This model was trained from scratch on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Train Loss: 1.1173
	- Validation Loss: 1.0714
	- Epoch: 59

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
	- training_precision: float32

	### Training results

	\| Train Loss \| Validation Loss \| Epoch \|
	\|:----------:\|:---------------:\|:-----:\|
	\| 3.5889 \| 2.6197 \| 0 \|
	\| 2.4784 \| 2.2040 \| 1 \|
	\| 2.1855 \| 1.9980 \| 2 \|
	\| 2.0181 \| 1.8643 \| 3 \|
	\| 1.9031 \| 1.7652 \| 4 \|
	\| 1.8166 \| 1.6924 \| 5 \|
	\| 1.7467 \| 1.6360 \| 6 \|
	\| 1.6904 \| 1.5843 \| 7 \|
	\| 1.6430 \| 1.5421 \| 8 \|
	\| 1.6021 \| 1.5059 \| 9 \|
	\| 1.5668 \| 1.4761 \| 10 \|
	\| 1.5359 \| 1.4481 \| 11 \|
	\| 1.5071 \| 1.4220 \| 12 \|
	\| 1.4841 \| 1.4020 \| 13 \|
	\| 1.4608 \| 1.3797 \| 14 \|
	\| 1.4399 \| 1.3595 \| 15 \|
	\| 1.4213 \| 1.3426 \| 16 \|
	\| 1.4031 \| 1.3266 \| 17 \|
	\| 1.3875 \| 1.3113 \| 18 \|
	\| 1.3735 \| 1.3024 \| 19 \|
	\| 1.3600 \| 1.2871 \| 20 \|
	\| 1.3456 \| 1.2753 \| 21 \|
	\| 1.3336 \| 1.2648 \| 22 \|
	\| 1.3214 \| 1.2539 \| 23 \|
	\| 1.3103 \| 1.2451 \| 24 \|
	\| 1.3005 \| 1.2335 \| 25 \|
	\| 1.2905 \| 1.2258 \| 26 \|
	\| 1.2815 \| 1.2179 \| 27 \|
	\| 1.2728 \| 1.2123 \| 28 \|
	\| 1.2643 \| 1.2029 \| 29 \|
	\| 1.2564 \| 1.1980 \| 30 \|
	\| 1.2494 \| 1.1877 \| 31 \|
	\| 1.2414 \| 1.1806 \| 32 \|
	\| 1.2348 \| 1.1788 \| 33 \|
	\| 1.2290 \| 1.1699 \| 34 \|
	\| 1.2209 \| 1.1654 \| 35 \|
	\| 1.2156 \| 1.1575 \| 36 \|
	\| 1.2110 \| 1.1537 \| 37 \|
	\| 1.2046 \| 1.1499 \| 38 \|
	\| 1.1986 \| 1.1436 \| 39 \|
	\| 1.1940 \| 1.1408 \| 40 \|
	\| 1.1877 \| 1.1356 \| 41 \|
	\| 1.1830 \| 1.1314 \| 42 \|
	\| 1.1779 \| 1.1278 \| 43 \|
	\| 1.1737 \| 1.1211 \| 44 \|
	\| 1.1692 \| 1.1192 \| 45 \|
	\| 1.1647 \| 1.1163 \| 46 \|
	\| 1.1611 \| 1.1107 \| 47 \|
	\| 1.1560 \| 1.1066 \| 48 \|
	\| 1.1521 \| 1.1060 \| 49 \|
	\| 1.1489 \| 1.1002 \| 50 \|
	\| 1.1440 \| 1.0960 \| 51 \|
	\| 1.1406 \| 1.0931 \| 52 \|
	\| 1.1373 \| 1.0897 \| 53 \|
	\| 1.1329 \| 1.0855 \| 54 \|
	\| 1.1302 \| 1.0842 \| 55 \|
	\| 1.1265 \| 1.0818 \| 56 \|
	\| 1.1237 \| 1.0784 \| 57 \|
	\| 1.1204 \| 1.0737 \| 58 \|
	\| 1.1173 \| 1.0714 \| 59 \|


	### Framework versions

	- Transformers 4.20.1
	- TensorFlow 2.8.2
	- Datasets 2.3.2
	- Tokenizers 0.12.1

	---
	tags:
	- generated_from_keras_callback
	model-index:
	- name: distilgpt_new_0060
	results: []
	---

	<!-- This model card has been generated automatically according to the information Keras had access to. You should
	probably proofread and complete it, then remove this comment. -->

	# distilgpt_new_0060

	This model was trained from scratch on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Train Loss: 1.1173
	- Validation Loss: 1.0714
	- Epoch: 59

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': 2e-05, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False, 'weight_decay_rate': 0.01}
	- training_precision: float32

	### Training results

	\| Train Loss \| Validation Loss \| Epoch \|
	\|:----------:\|:---------------:\|:-----:\|
	\| 3.5889 \| 2.6197 \| 0 \|
	\| 2.4784 \| 2.2040 \| 1 \|
	\| 2.1855 \| 1.9980 \| 2 \|
	\| 2.0181 \| 1.8643 \| 3 \|
	\| 1.9031 \| 1.7652 \| 4 \|
	\| 1.8166 \| 1.6924 \| 5 \|
	\| 1.7467 \| 1.6360 \| 6 \|
	\| 1.6904 \| 1.5843 \| 7 \|
	\| 1.6430 \| 1.5421 \| 8 \|
	\| 1.6021 \| 1.5059 \| 9 \|
	\| 1.5668 \| 1.4761 \| 10 \|
	\| 1.5359 \| 1.4481 \| 11 \|
	\| 1.5071 \| 1.4220 \| 12 \|
	\| 1.4841 \| 1.4020 \| 13 \|
	\| 1.4608 \| 1.3797 \| 14 \|
	\| 1.4399 \| 1.3595 \| 15 \|
	\| 1.4213 \| 1.3426 \| 16 \|
	\| 1.4031 \| 1.3266 \| 17 \|
	\| 1.3875 \| 1.3113 \| 18 \|
	\| 1.3735 \| 1.3024 \| 19 \|
	\| 1.3600 \| 1.2871 \| 20 \|
	\| 1.3456 \| 1.2753 \| 21 \|
	\| 1.3336 \| 1.2648 \| 22 \|
	\| 1.3214 \| 1.2539 \| 23 \|
	\| 1.3103 \| 1.2451 \| 24 \|
	\| 1.3005 \| 1.2335 \| 25 \|
	\| 1.2905 \| 1.2258 \| 26 \|
	\| 1.2815 \| 1.2179 \| 27 \|
	\| 1.2728 \| 1.2123 \| 28 \|
	\| 1.2643 \| 1.2029 \| 29 \|
	\| 1.2564 \| 1.1980 \| 30 \|
	\| 1.2494 \| 1.1877 \| 31 \|
	\| 1.2414 \| 1.1806 \| 32 \|
	\| 1.2348 \| 1.1788 \| 33 \|
	\| 1.2290 \| 1.1699 \| 34 \|
	\| 1.2209 \| 1.1654 \| 35 \|
	\| 1.2156 \| 1.1575 \| 36 \|
	\| 1.2110 \| 1.1537 \| 37 \|
	\| 1.2046 \| 1.1499 \| 38 \|
	\| 1.1986 \| 1.1436 \| 39 \|
	\| 1.1940 \| 1.1408 \| 40 \|
	\| 1.1877 \| 1.1356 \| 41 \|
	\| 1.1830 \| 1.1314 \| 42 \|
	\| 1.1779 \| 1.1278 \| 43 \|
	\| 1.1737 \| 1.1211 \| 44 \|
	\| 1.1692 \| 1.1192 \| 45 \|
	\| 1.1647 \| 1.1163 \| 46 \|
	\| 1.1611 \| 1.1107 \| 47 \|
	\| 1.1560 \| 1.1066 \| 48 \|
	\| 1.1521 \| 1.1060 \| 49 \|
	\| 1.1489 \| 1.1002 \| 50 \|
	\| 1.1440 \| 1.0960 \| 51 \|
	\| 1.1406 \| 1.0931 \| 52 \|
	\| 1.1373 \| 1.0897 \| 53 \|
	\| 1.1329 \| 1.0855 \| 54 \|
	\| 1.1302 \| 1.0842 \| 55 \|
	\| 1.1265 \| 1.0818 \| 56 \|
	\| 1.1237 \| 1.0784 \| 57 \|
	\| 1.1204 \| 1.0737 \| 58 \|
	\| 1.1173 \| 1.0714 \| 59 \|


	### Framework versions

	- Transformers 4.20.1
	- TensorFlow 2.8.2
	- Datasets 2.3.2
	- Tokenizers 0.12.1