MAdAiLab
/

SLM_vs_LLM_experiments

Safetensors

Model card Files Files and versions Community

SLM_vs_LLM_experiments / google_t5 /t5_base_patent /README.md

akkky02

Upload folder using huggingface_hub

6ccfb8f verified 4 months ago

preview code

raw

history blame

No virus

5.35 kB

	---
	license: apache-2.0
	base_model: google-t5/t5-base
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: t5_base_patent
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# t5_base_patent

	This model is a fine-tuned version of [google-t5/t5-base](https://huggingface.co/google-t5/t5-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.9276
	- Accuracy: 0.6776
	- F1 Macro: 0.6237
	- F1 Micro: 0.6776

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- total_train_batch_size: 32
	- total_eval_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| F1 Macro \| F1 Micro \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:--------:\|:--------:\|
	\| 1.3522 \| 0.06 \| 50 \| 1.4202 \| 0.5254 \| 0.3609 \| 0.5254 \|
	\| 1.1693 \| 0.13 \| 100 \| 1.1674 \| 0.597 \| 0.4695 \| 0.597 \|
	\| 1.171 \| 0.19 \| 150 \| 1.1373 \| 0.6052 \| 0.4713 \| 0.6052 \|
	\| 1.048 \| 0.26 \| 200 \| 1.0826 \| 0.6286 \| 0.5499 \| 0.6286 \|
	\| 0.9991 \| 0.32 \| 250 \| 1.0599 \| 0.638 \| 0.5422 \| 0.638 \|
	\| 1.1814 \| 0.38 \| 300 \| 1.0633 \| 0.6332 \| 0.5593 \| 0.6332 \|
	\| 1.0864 \| 0.45 \| 350 \| 1.0400 \| 0.6392 \| 0.5678 \| 0.6392 \|
	\| 0.9748 \| 0.51 \| 400 \| 1.0440 \| 0.6424 \| 0.5613 \| 0.6424 \|
	\| 1.0267 \| 0.58 \| 450 \| 1.0116 \| 0.6526 \| 0.5818 \| 0.6526 \|
	\| 1.0052 \| 0.64 \| 500 \| 0.9948 \| 0.657 \| 0.5787 \| 0.657 \|
	\| 0.9244 \| 0.7 \| 550 \| 1.0002 \| 0.657 \| 0.5870 \| 0.657 \|
	\| 1.0172 \| 0.77 \| 600 \| 0.9869 \| 0.661 \| 0.5889 \| 0.661 \|
	\| 1.032 \| 0.83 \| 650 \| 0.9922 \| 0.658 \| 0.5967 \| 0.658 \|
	\| 0.9623 \| 0.9 \| 700 \| 0.9955 \| 0.6488 \| 0.5863 \| 0.6488 \|
	\| 0.9257 \| 0.96 \| 750 \| 0.9993 \| 0.6556 \| 0.5884 \| 0.6556 \|
	\| 0.7956 \| 1.02 \| 800 \| 0.9737 \| 0.6662 \| 0.6148 \| 0.6662 \|
	\| 0.8475 \| 1.09 \| 850 \| 1.0125 \| 0.6544 \| 0.5729 \| 0.6544 \|
	\| 0.8527 \| 1.15 \| 900 \| 0.9999 \| 0.6524 \| 0.5897 \| 0.6524 \|
	\| 0.8587 \| 1.21 \| 950 \| 1.0072 \| 0.6576 \| 0.5873 \| 0.6576 \|
	\| 0.8855 \| 1.28 \| 1000 \| 0.9840 \| 0.6592 \| 0.6035 \| 0.6592 \|
	\| 0.7015 \| 1.34 \| 1050 \| 0.9847 \| 0.6682 \| 0.5993 \| 0.6682 \|
	\| 0.8116 \| 1.41 \| 1100 \| 0.9702 \| 0.6678 \| 0.6079 \| 0.6678 \|
	\| 0.8409 \| 1.47 \| 1150 \| 0.9789 \| 0.6606 \| 0.6017 \| 0.6606 \|
	\| 0.7889 \| 1.53 \| 1200 \| 0.9462 \| 0.6818 \| 0.6125 \| 0.6818 \|
	\| 0.8059 \| 1.6 \| 1250 \| 0.9375 \| 0.6694 \| 0.6093 \| 0.6694 \|
	\| 0.7893 \| 1.66 \| 1300 \| 0.9467 \| 0.6762 \| 0.6102 \| 0.6762 \|
	\| 0.8152 \| 1.73 \| 1350 \| 0.9396 \| 0.6822 \| 0.6158 \| 0.6822 \|
	\| 0.7644 \| 1.79 \| 1400 \| 0.9445 \| 0.6798 \| 0.6190 \| 0.6798 \|
	\| 0.7252 \| 1.85 \| 1450 \| 0.9285 \| 0.688 \| 0.6209 \| 0.688 \|
	\| 1.0028 \| 1.92 \| 1500 \| 0.9379 \| 0.6702 \| 0.6079 \| 0.6702 \|
	\| 0.8056 \| 1.98 \| 1550 \| 0.9276 \| 0.6776 \| 0.6237 \| 0.6776 \|
	\| 0.5781 \| 2.05 \| 1600 \| 0.9509 \| 0.6864 \| 0.6215 \| 0.6864 \|
	\| 0.5592 \| 2.11 \| 1650 \| 0.9535 \| 0.6866 \| 0.6354 \| 0.6866 \|
	\| 0.6818 \| 2.17 \| 1700 \| 0.9812 \| 0.682 \| 0.6203 \| 0.682 \|
	\| 0.6022 \| 2.24 \| 1750 \| 0.9842 \| 0.6822 \| 0.6270 \| 0.6822 \|
	\| 0.5771 \| 2.3 \| 1800 \| 1.0100 \| 0.6832 \| 0.6295 \| 0.6832 \|
	\| 0.596 \| 2.37 \| 1850 \| 1.0079 \| 0.6784 \| 0.6280 \| 0.6784 \|
	\| 0.5209 \| 2.43 \| 1900 \| 1.0118 \| 0.6828 \| 0.6257 \| 0.6828 \|
	\| 0.4842 \| 2.49 \| 1950 \| 1.0165 \| 0.68 \| 0.6253 \| 0.68 \|
	\| 0.6581 \| 2.56 \| 2000 \| 1.0119 \| 0.6774 \| 0.6234 \| 0.6774 \|
	\| 0.6417 \| 2.62 \| 2050 \| 1.0035 \| 0.6834 \| 0.6345 \| 0.6834 \|
	\| 0.5388 \| 2.69 \| 2100 \| 1.0133 \| 0.681 \| 0.6321 \| 0.681 \|
	\| 0.546 \| 2.75 \| 2150 \| 1.0133 \| 0.6808 \| 0.6313 \| 0.6808 \|
	\| 0.5825 \| 2.81 \| 2200 \| 1.0058 \| 0.683 \| 0.6316 \| 0.683 \|
	\| 0.6251 \| 2.88 \| 2250 \| 1.0062 \| 0.6848 \| 0.6357 \| 0.6848 \|
	\| 0.619 \| 2.94 \| 2300 \| 1.0014 \| 0.6826 \| 0.6307 \| 0.6826 \|


	### Framework versions

	- Transformers 4.39.0.dev0
	- Pytorch 2.2.1+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.2

	---
	license: apache-2.0
	base_model: google-t5/t5-base
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: t5_base_patent
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# t5_base_patent

	This model is a fine-tuned version of [google-t5/t5-base](https://huggingface.co/google-t5/t5-base) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.9276
	- Accuracy: 0.6776
	- F1 Macro: 0.6237
	- F1 Micro: 0.6776

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- total_train_batch_size: 32
	- total_eval_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| F1 Macro \| F1 Micro \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:--------:\|:--------:\|
	\| 1.3522 \| 0.06 \| 50 \| 1.4202 \| 0.5254 \| 0.3609 \| 0.5254 \|
	\| 1.1693 \| 0.13 \| 100 \| 1.1674 \| 0.597 \| 0.4695 \| 0.597 \|
	\| 1.171 \| 0.19 \| 150 \| 1.1373 \| 0.6052 \| 0.4713 \| 0.6052 \|
	\| 1.048 \| 0.26 \| 200 \| 1.0826 \| 0.6286 \| 0.5499 \| 0.6286 \|
	\| 0.9991 \| 0.32 \| 250 \| 1.0599 \| 0.638 \| 0.5422 \| 0.638 \|
	\| 1.1814 \| 0.38 \| 300 \| 1.0633 \| 0.6332 \| 0.5593 \| 0.6332 \|
	\| 1.0864 \| 0.45 \| 350 \| 1.0400 \| 0.6392 \| 0.5678 \| 0.6392 \|
	\| 0.9748 \| 0.51 \| 400 \| 1.0440 \| 0.6424 \| 0.5613 \| 0.6424 \|
	\| 1.0267 \| 0.58 \| 450 \| 1.0116 \| 0.6526 \| 0.5818 \| 0.6526 \|
	\| 1.0052 \| 0.64 \| 500 \| 0.9948 \| 0.657 \| 0.5787 \| 0.657 \|
	\| 0.9244 \| 0.7 \| 550 \| 1.0002 \| 0.657 \| 0.5870 \| 0.657 \|
	\| 1.0172 \| 0.77 \| 600 \| 0.9869 \| 0.661 \| 0.5889 \| 0.661 \|
	\| 1.032 \| 0.83 \| 650 \| 0.9922 \| 0.658 \| 0.5967 \| 0.658 \|
	\| 0.9623 \| 0.9 \| 700 \| 0.9955 \| 0.6488 \| 0.5863 \| 0.6488 \|
	\| 0.9257 \| 0.96 \| 750 \| 0.9993 \| 0.6556 \| 0.5884 \| 0.6556 \|
	\| 0.7956 \| 1.02 \| 800 \| 0.9737 \| 0.6662 \| 0.6148 \| 0.6662 \|
	\| 0.8475 \| 1.09 \| 850 \| 1.0125 \| 0.6544 \| 0.5729 \| 0.6544 \|
	\| 0.8527 \| 1.15 \| 900 \| 0.9999 \| 0.6524 \| 0.5897 \| 0.6524 \|
	\| 0.8587 \| 1.21 \| 950 \| 1.0072 \| 0.6576 \| 0.5873 \| 0.6576 \|
	\| 0.8855 \| 1.28 \| 1000 \| 0.9840 \| 0.6592 \| 0.6035 \| 0.6592 \|
	\| 0.7015 \| 1.34 \| 1050 \| 0.9847 \| 0.6682 \| 0.5993 \| 0.6682 \|
	\| 0.8116 \| 1.41 \| 1100 \| 0.9702 \| 0.6678 \| 0.6079 \| 0.6678 \|
	\| 0.8409 \| 1.47 \| 1150 \| 0.9789 \| 0.6606 \| 0.6017 \| 0.6606 \|
	\| 0.7889 \| 1.53 \| 1200 \| 0.9462 \| 0.6818 \| 0.6125 \| 0.6818 \|
	\| 0.8059 \| 1.6 \| 1250 \| 0.9375 \| 0.6694 \| 0.6093 \| 0.6694 \|
	\| 0.7893 \| 1.66 \| 1300 \| 0.9467 \| 0.6762 \| 0.6102 \| 0.6762 \|
	\| 0.8152 \| 1.73 \| 1350 \| 0.9396 \| 0.6822 \| 0.6158 \| 0.6822 \|
	\| 0.7644 \| 1.79 \| 1400 \| 0.9445 \| 0.6798 \| 0.6190 \| 0.6798 \|
	\| 0.7252 \| 1.85 \| 1450 \| 0.9285 \| 0.688 \| 0.6209 \| 0.688 \|
	\| 1.0028 \| 1.92 \| 1500 \| 0.9379 \| 0.6702 \| 0.6079 \| 0.6702 \|
	\| 0.8056 \| 1.98 \| 1550 \| 0.9276 \| 0.6776 \| 0.6237 \| 0.6776 \|
	\| 0.5781 \| 2.05 \| 1600 \| 0.9509 \| 0.6864 \| 0.6215 \| 0.6864 \|
	\| 0.5592 \| 2.11 \| 1650 \| 0.9535 \| 0.6866 \| 0.6354 \| 0.6866 \|
	\| 0.6818 \| 2.17 \| 1700 \| 0.9812 \| 0.682 \| 0.6203 \| 0.682 \|
	\| 0.6022 \| 2.24 \| 1750 \| 0.9842 \| 0.6822 \| 0.6270 \| 0.6822 \|
	\| 0.5771 \| 2.3 \| 1800 \| 1.0100 \| 0.6832 \| 0.6295 \| 0.6832 \|
	\| 0.596 \| 2.37 \| 1850 \| 1.0079 \| 0.6784 \| 0.6280 \| 0.6784 \|
	\| 0.5209 \| 2.43 \| 1900 \| 1.0118 \| 0.6828 \| 0.6257 \| 0.6828 \|
	\| 0.4842 \| 2.49 \| 1950 \| 1.0165 \| 0.68 \| 0.6253 \| 0.68 \|
	\| 0.6581 \| 2.56 \| 2000 \| 1.0119 \| 0.6774 \| 0.6234 \| 0.6774 \|
	\| 0.6417 \| 2.62 \| 2050 \| 1.0035 \| 0.6834 \| 0.6345 \| 0.6834 \|
	\| 0.5388 \| 2.69 \| 2100 \| 1.0133 \| 0.681 \| 0.6321 \| 0.681 \|
	\| 0.546 \| 2.75 \| 2150 \| 1.0133 \| 0.6808 \| 0.6313 \| 0.6808 \|
	\| 0.5825 \| 2.81 \| 2200 \| 1.0058 \| 0.683 \| 0.6316 \| 0.683 \|
	\| 0.6251 \| 2.88 \| 2250 \| 1.0062 \| 0.6848 \| 0.6357 \| 0.6848 \|
	\| 0.619 \| 2.94 \| 2300 \| 1.0014 \| 0.6826 \| 0.6307 \| 0.6826 \|


	### Framework versions

	- Transformers 4.39.0.dev0
	- Pytorch 2.2.1+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.2