gokuls
/

bert_12_layer_model_v3_complete_training_new_emb_compress_48

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

bert_12_layer_model_v3_complete_training_new_emb_compress_48 / README.md

gokuls's picture

End of training

154896a 10 months ago

|

history blame contribute delete

No virus

3.2 kB

	---
	tags:
	- generated_from_trainer
	datasets:
	- gokuls/wiki_book_corpus_complete_processed_bert_dataset
	metrics:
	- accuracy
	model-index:
	- name: bert_12_layer_model_v3_complete_training_new_emb_compress_48
	results:
	- task:
	name: Masked Language Modeling
	type: fill-mask
	dataset:
	name: gokuls/wiki_book_corpus_complete_processed_bert_dataset
	type: gokuls/wiki_book_corpus_complete_processed_bert_dataset
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.1573752894874488
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# bert_12_layer_model_v3_complete_training_new_emb_compress_48

	This model is a fine-tuned version of [](https://huggingface.co/) on the gokuls/wiki_book_corpus_complete_processed_bert_dataset dataset.
	It achieves the following results on the evaluation set:
	- Loss: 5.9594
	- Accuracy: 0.1574

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 48
	- eval_batch_size: 48
	- seed: 10
	- distributed_type: multi-GPU
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 10000
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|:--------:\|
	\| 7.1148 \| 0.08 \| 10000 \| 7.0921 \| 0.0828 \|
	\| 6.6864 \| 0.16 \| 20000 \| 6.6879 \| 0.1078 \|
	\| 6.5451 \| 0.25 \| 30000 \| 6.5435 \| 0.1184 \|
	\| 6.4606 \| 0.33 \| 40000 \| 6.4515 \| 0.1262 \|
	\| 6.3851 \| 0.41 \| 50000 \| 6.3851 \| 0.1312 \|
	\| 6.3371 \| 0.49 \| 60000 \| 6.3357 \| 0.1342 \|
	\| 6.2971 \| 0.57 \| 70000 \| 6.2923 \| 0.1373 \|
	\| 6.2682 \| 0.66 \| 80000 \| 6.2605 \| 0.1399 \|
	\| 6.2352 \| 0.74 \| 90000 \| 6.2301 \| 0.1411 \|
	\| 6.214 \| 0.82 \| 100000 \| 6.2090 \| 0.1430 \|
	\| 6.1837 \| 0.9 \| 110000 \| 6.1865 \| 0.1443 \|
	\| 6.1726 \| 0.98 \| 120000 \| 6.1682 \| 0.1451 \|
	\| 6.1524 \| 1.07 \| 130000 \| 6.1498 \| 0.1464 \|
	\| 6.1293 \| 1.15 \| 140000 \| 6.1300 \| 0.1468 \|
	\| 6.1116 \| 1.23 \| 150000 \| 6.1026 \| 0.1479 \|
	\| 6.0839 \| 1.31 \| 160000 \| 6.0797 \| 0.1490 \|
	\| 6.0616 \| 1.39 \| 170000 \| 6.0590 \| 0.1499 \|
	\| 6.0508 \| 1.47 \| 180000 \| 6.0399 \| 0.1509 \|
	\| 6.0311 \| 1.56 \| 190000 \| 6.0233 \| 0.1517 \|
	\| 6.015 \| 1.64 \| 200000 \| 6.0048 \| 0.1533 \|
	\| 5.985 \| 1.72 \| 210000 \| 5.9863 \| 0.1547 \|
	\| 5.9661 \| 1.8 \| 220000 \| 5.9595 \| 0.1573 \|


	### Framework versions

	- Transformers 4.33.2
	- Pytorch 1.14.0a0+410ce96
	- Datasets 2.14.5
	- Tokenizers 0.13.3