update model card README.md

2fc5157 over 2 years ago

4.13 kB

	---
	license: mit
	tags:
	- fill-mask
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: deberta-v3-large-dapt-scientific-papers-pubmed
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# deberta-v3-large-dapt-scientific-papers-pubmed

	This model is a fine-tuned version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 4.4729
	- Accuracy: 0.3510

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 12
	- eval_batch_size: 12
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 10000
	- training_steps: 21600
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:--------:\|
	\| 12.0315 \| 0.02 \| 500 \| 11.6840 \| 0.0 \|
	\| 11.0675 \| 0.05 \| 1000 \| 8.9471 \| 0.0226 \|
	\| 8.6646 \| 0.07 \| 1500 \| 8.0093 \| 0.0344 \|
	\| 8.3625 \| 0.09 \| 2000 \| 7.9624 \| 0.0274 \|
	\| 8.2467 \| 0.12 \| 2500 \| 7.6599 \| 0.0376 \|
	\| 7.9714 \| 0.14 \| 3000 \| 7.6716 \| 0.0316 \|
	\| 7.9852 \| 0.16 \| 3500 \| 7.4535 \| 0.0385 \|
	\| 7.7502 \| 0.19 \| 4000 \| 7.4293 \| 0.0429 \|
	\| 7.7016 \| 0.21 \| 4500 \| 7.3576 \| 0.0397 \|
	\| 7.5789 \| 0.23 \| 5000 \| 7.3124 \| 0.0513 \|
	\| 7.4141 \| 0.25 \| 5500 \| 7.1353 \| 0.0634 \|
	\| 7.2365 \| 0.28 \| 6000 \| 6.8600 \| 0.0959 \|
	\| 7.0725 \| 0.3 \| 6500 \| 6.5743 \| 0.1150 \|
	\| 6.934 \| 0.32 \| 7000 \| 6.3674 \| 0.1415 \|
	\| 6.7219 \| 0.35 \| 7500 \| 6.3467 \| 0.1581 \|
	\| 6.5039 \| 0.37 \| 8000 \| 6.1312 \| 0.1815 \|
	\| 6.3096 \| 0.39 \| 8500 \| 5.9080 \| 0.2134 \|
	\| 6.1835 \| 0.42 \| 9000 \| 5.8414 \| 0.2137 \|
	\| 6.0939 \| 0.44 \| 9500 \| 5.5137 \| 0.2553 \|
	\| 6.0457 \| 0.46 \| 10000 \| 5.5881 \| 0.2545 \|
	\| 5.8851 \| 0.49 \| 10500 \| 5.5134 \| 0.2497 \|
	\| 5.7277 \| 0.51 \| 11000 \| 5.3023 \| 0.2699 \|
	\| 5.6183 \| 0.53 \| 11500 \| 5.0074 \| 0.3019 \|
	\| 5.4978 \| 0.56 \| 12000 \| 5.1822 \| 0.2814 \|
	\| 5.5916 \| 0.58 \| 12500 \| 5.1211 \| 0.2808 \|
	\| 5.4749 \| 0.6 \| 13000 \| 4.9126 \| 0.2972 \|
	\| 5.3765 \| 0.62 \| 13500 \| 5.0468 \| 0.2899 \|
	\| 5.3529 \| 0.65 \| 14000 \| 4.8160 \| 0.3037 \|
	\| 5.2993 \| 0.67 \| 14500 \| 4.8598 \| 0.3141 \|
	\| 5.2929 \| 0.69 \| 15000 \| 4.9669 \| 0.3052 \|
	\| 5.2649 \| 0.72 \| 15500 \| 4.7849 \| 0.3270 \|
	\| 5.162 \| 0.74 \| 16000 \| 4.6819 \| 0.3357 \|
	\| 5.1639 \| 0.76 \| 16500 \| 4.6056 \| 0.3275 \|
	\| 5.1245 \| 0.79 \| 17000 \| 4.5473 \| 0.3311 \|
	\| 5.1596 \| 0.81 \| 17500 \| 4.7008 \| 0.3212 \|
	\| 5.1346 \| 0.83 \| 18000 \| 4.7932 \| 0.3192 \|
	\| 5.1174 \| 0.86 \| 18500 \| 4.7624 \| 0.3208 \|
	\| 5.1152 \| 0.88 \| 19000 \| 4.6388 \| 0.3274 \|
	\| 5.0852 \| 0.9 \| 19500 \| 4.5247 \| 0.3305 \|
	\| 5.0564 \| 0.93 \| 20000 \| 4.6982 \| 0.3161 \|
	\| 5.0179 \| 0.95 \| 20500 \| 4.5363 \| 0.3389 \|
	\| 5.07 \| 0.97 \| 21000 \| 4.6647 \| 0.3307 \|
	\| 5.0781 \| 1.0 \| 21500 \| 4.4729 \| 0.3510 \|


	### Framework versions

	- Transformers 4.18.0
	- Pytorch 1.11.0
	- Datasets 2.1.0
	- Tokenizers 0.12.1

	---
	license: mit
	tags:
	- fill-mask
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: deberta-v3-large-dapt-scientific-papers-pubmed
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# deberta-v3-large-dapt-scientific-papers-pubmed

	This model is a fine-tuned version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 4.4729
	- Accuracy: 0.3510

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 12
	- eval_batch_size: 12
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 10000
	- training_steps: 21600
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:--------:\|
	\| 12.0315 \| 0.02 \| 500 \| 11.6840 \| 0.0 \|
	\| 11.0675 \| 0.05 \| 1000 \| 8.9471 \| 0.0226 \|
	\| 8.6646 \| 0.07 \| 1500 \| 8.0093 \| 0.0344 \|
	\| 8.3625 \| 0.09 \| 2000 \| 7.9624 \| 0.0274 \|
	\| 8.2467 \| 0.12 \| 2500 \| 7.6599 \| 0.0376 \|
	\| 7.9714 \| 0.14 \| 3000 \| 7.6716 \| 0.0316 \|
	\| 7.9852 \| 0.16 \| 3500 \| 7.4535 \| 0.0385 \|
	\| 7.7502 \| 0.19 \| 4000 \| 7.4293 \| 0.0429 \|
	\| 7.7016 \| 0.21 \| 4500 \| 7.3576 \| 0.0397 \|
	\| 7.5789 \| 0.23 \| 5000 \| 7.3124 \| 0.0513 \|
	\| 7.4141 \| 0.25 \| 5500 \| 7.1353 \| 0.0634 \|
	\| 7.2365 \| 0.28 \| 6000 \| 6.8600 \| 0.0959 \|
	\| 7.0725 \| 0.3 \| 6500 \| 6.5743 \| 0.1150 \|
	\| 6.934 \| 0.32 \| 7000 \| 6.3674 \| 0.1415 \|
	\| 6.7219 \| 0.35 \| 7500 \| 6.3467 \| 0.1581 \|
	\| 6.5039 \| 0.37 \| 8000 \| 6.1312 \| 0.1815 \|
	\| 6.3096 \| 0.39 \| 8500 \| 5.9080 \| 0.2134 \|
	\| 6.1835 \| 0.42 \| 9000 \| 5.8414 \| 0.2137 \|
	\| 6.0939 \| 0.44 \| 9500 \| 5.5137 \| 0.2553 \|
	\| 6.0457 \| 0.46 \| 10000 \| 5.5881 \| 0.2545 \|
	\| 5.8851 \| 0.49 \| 10500 \| 5.5134 \| 0.2497 \|
	\| 5.7277 \| 0.51 \| 11000 \| 5.3023 \| 0.2699 \|
	\| 5.6183 \| 0.53 \| 11500 \| 5.0074 \| 0.3019 \|
	\| 5.4978 \| 0.56 \| 12000 \| 5.1822 \| 0.2814 \|
	\| 5.5916 \| 0.58 \| 12500 \| 5.1211 \| 0.2808 \|
	\| 5.4749 \| 0.6 \| 13000 \| 4.9126 \| 0.2972 \|
	\| 5.3765 \| 0.62 \| 13500 \| 5.0468 \| 0.2899 \|
	\| 5.3529 \| 0.65 \| 14000 \| 4.8160 \| 0.3037 \|
	\| 5.2993 \| 0.67 \| 14500 \| 4.8598 \| 0.3141 \|
	\| 5.2929 \| 0.69 \| 15000 \| 4.9669 \| 0.3052 \|
	\| 5.2649 \| 0.72 \| 15500 \| 4.7849 \| 0.3270 \|
	\| 5.162 \| 0.74 \| 16000 \| 4.6819 \| 0.3357 \|
	\| 5.1639 \| 0.76 \| 16500 \| 4.6056 \| 0.3275 \|
	\| 5.1245 \| 0.79 \| 17000 \| 4.5473 \| 0.3311 \|
	\| 5.1596 \| 0.81 \| 17500 \| 4.7008 \| 0.3212 \|
	\| 5.1346 \| 0.83 \| 18000 \| 4.7932 \| 0.3192 \|
	\| 5.1174 \| 0.86 \| 18500 \| 4.7624 \| 0.3208 \|
	\| 5.1152 \| 0.88 \| 19000 \| 4.6388 \| 0.3274 \|
	\| 5.0852 \| 0.9 \| 19500 \| 4.5247 \| 0.3305 \|
	\| 5.0564 \| 0.93 \| 20000 \| 4.6982 \| 0.3161 \|
	\| 5.0179 \| 0.95 \| 20500 \| 4.5363 \| 0.3389 \|
	\| 5.07 \| 0.97 \| 21000 \| 4.6647 \| 0.3307 \|
	\| 5.0781 \| 1.0 \| 21500 \| 4.4729 \| 0.3510 \|


	### Framework versions

	- Transformers 4.18.0
	- Pytorch 1.11.0
	- Datasets 2.1.0
	- Tokenizers 0.12.1