End of training

308fd26 verified 9 months ago

3.91 kB

	---
	base_model: csebuetnlp/banglabert
	tags:
	- generated_from_trainer
	model-index:
	- name: Banglabert_nwp_finetuning_def_v2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Banglabert_nwp_finetuning_def_v2

	This model is a fine-tuned version of [csebuetnlp/banglabert](https://huggingface.co/csebuetnlp/banglabert) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.5135

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|
	\| 5.9985 \| 1.0 \| 2487 \| 5.4179 \|
	\| 5.1899 \| 2.0 \| 4974 \| 4.8725 \|
	\| 4.8308 \| 3.0 \| 7461 \| 4.6051 \|
	\| 4.6303 \| 4.0 \| 9948 \| 4.4425 \|
	\| 4.4543 \| 5.0 \| 12435 \| 4.3072 \|
	\| 4.2875 \| 6.0 \| 14922 \| 4.2110 \|
	\| 4.2355 \| 7.0 \| 17409 \| 4.1004 \|
	\| 4.1108 \| 8.0 \| 19896 \| 4.0693 \|
	\| 4.0311 \| 9.0 \| 22383 \| 3.9807 \|
	\| 3.9836 \| 10.0 \| 24870 \| 3.9329 \|
	\| 3.9049 \| 11.0 \| 27357 \| 3.9332 \|
	\| 3.8663 \| 12.0 \| 29844 \| 3.9027 \|
	\| 3.7633 \| 13.0 \| 32331 \| 3.8750 \|
	\| 3.7639 \| 14.0 \| 34818 \| 3.7487 \|
	\| 3.6831 \| 15.0 \| 37305 \| 3.7775 \|
	\| 3.6808 \| 16.0 \| 39792 \| 3.7372 \|
	\| 3.6136 \| 17.0 \| 42279 \| 3.7313 \|
	\| 3.5998 \| 18.0 \| 44766 \| 3.6778 \|
	\| 3.531 \| 19.0 \| 47253 \| 3.6912 \|
	\| 3.5361 \| 20.0 \| 49740 \| 3.6869 \|
	\| 3.509 \| 21.0 \| 52227 \| 3.6790 \|
	\| 3.4625 \| 22.0 \| 54714 \| 3.6425 \|
	\| 3.418 \| 23.0 \| 57201 \| 3.6572 \|
	\| 3.369 \| 24.0 \| 59688 \| 3.6407 \|
	\| 3.3832 \| 25.0 \| 62175 \| 3.6278 \|
	\| 3.3728 \| 26.0 \| 64662 \| 3.5715 \|
	\| 3.3304 \| 27.0 \| 67149 \| 3.6413 \|
	\| 3.2864 \| 28.0 \| 69636 \| 3.5743 \|
	\| 3.3057 \| 29.0 \| 72123 \| 3.5227 \|
	\| 3.2916 \| 30.0 \| 74610 \| 3.5448 \|
	\| 3.2541 \| 31.0 \| 77097 \| 3.5422 \|
	\| 3.2293 \| 32.0 \| 79584 \| 3.5775 \|
	\| 3.1839 \| 33.0 \| 82071 \| 3.5705 \|
	\| 3.2106 \| 34.0 \| 84558 \| 3.5680 \|
	\| 3.185 \| 35.0 \| 87045 \| 3.5225 \|
	\| 3.1845 \| 36.0 \| 89532 \| 3.5237 \|
	\| 3.1581 \| 37.0 \| 92019 \| 3.5300 \|
	\| 3.1569 \| 38.0 \| 94506 \| 3.5081 \|
	\| 3.1222 \| 39.0 \| 96993 \| 3.5217 \|
	\| 3.1007 \| 40.0 \| 99480 \| 3.4810 \|
	\| 3.1094 \| 41.0 \| 101967 \| 3.5475 \|
	\| 3.1289 \| 42.0 \| 104454 \| 3.5126 \|
	\| 3.0841 \| 43.0 \| 106941 \| 3.5076 \|
	\| 3.0834 \| 44.0 \| 109428 \| 3.5101 \|
	\| 3.0862 \| 45.0 \| 111915 \| 3.4777 \|
	\| 3.0843 \| 46.0 \| 114402 \| 3.5116 \|
	\| 3.042 \| 47.0 \| 116889 \| 3.5031 \|
	\| 3.0424 \| 48.0 \| 119376 \| 3.4991 \|
	\| 3.0855 \| 49.0 \| 121863 \| 3.5203 \|
	\| 3.0325 \| 50.0 \| 124350 \| 3.5110 \|


	### Framework versions

	- Transformers 4.38.1
	- Pytorch 2.1.0+cu121
	- Datasets 2.17.1
	- Tokenizers 0.15.2

	---
	base_model: csebuetnlp/banglabert
	tags:
	- generated_from_trainer
	model-index:
	- name: Banglabert_nwp_finetuning_def_v2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Banglabert_nwp_finetuning_def_v2

	This model is a fine-tuned version of [csebuetnlp/banglabert](https://huggingface.co/csebuetnlp/banglabert) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 3.5135

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 50

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:------:\|:---------------:\|
	\| 5.9985 \| 1.0 \| 2487 \| 5.4179 \|
	\| 5.1899 \| 2.0 \| 4974 \| 4.8725 \|
	\| 4.8308 \| 3.0 \| 7461 \| 4.6051 \|
	\| 4.6303 \| 4.0 \| 9948 \| 4.4425 \|
	\| 4.4543 \| 5.0 \| 12435 \| 4.3072 \|
	\| 4.2875 \| 6.0 \| 14922 \| 4.2110 \|
	\| 4.2355 \| 7.0 \| 17409 \| 4.1004 \|
	\| 4.1108 \| 8.0 \| 19896 \| 4.0693 \|
	\| 4.0311 \| 9.0 \| 22383 \| 3.9807 \|
	\| 3.9836 \| 10.0 \| 24870 \| 3.9329 \|
	\| 3.9049 \| 11.0 \| 27357 \| 3.9332 \|
	\| 3.8663 \| 12.0 \| 29844 \| 3.9027 \|
	\| 3.7633 \| 13.0 \| 32331 \| 3.8750 \|
	\| 3.7639 \| 14.0 \| 34818 \| 3.7487 \|
	\| 3.6831 \| 15.0 \| 37305 \| 3.7775 \|
	\| 3.6808 \| 16.0 \| 39792 \| 3.7372 \|
	\| 3.6136 \| 17.0 \| 42279 \| 3.7313 \|
	\| 3.5998 \| 18.0 \| 44766 \| 3.6778 \|
	\| 3.531 \| 19.0 \| 47253 \| 3.6912 \|
	\| 3.5361 \| 20.0 \| 49740 \| 3.6869 \|
	\| 3.509 \| 21.0 \| 52227 \| 3.6790 \|
	\| 3.4625 \| 22.0 \| 54714 \| 3.6425 \|
	\| 3.418 \| 23.0 \| 57201 \| 3.6572 \|
	\| 3.369 \| 24.0 \| 59688 \| 3.6407 \|
	\| 3.3832 \| 25.0 \| 62175 \| 3.6278 \|
	\| 3.3728 \| 26.0 \| 64662 \| 3.5715 \|
	\| 3.3304 \| 27.0 \| 67149 \| 3.6413 \|
	\| 3.2864 \| 28.0 \| 69636 \| 3.5743 \|
	\| 3.3057 \| 29.0 \| 72123 \| 3.5227 \|
	\| 3.2916 \| 30.0 \| 74610 \| 3.5448 \|
	\| 3.2541 \| 31.0 \| 77097 \| 3.5422 \|
	\| 3.2293 \| 32.0 \| 79584 \| 3.5775 \|
	\| 3.1839 \| 33.0 \| 82071 \| 3.5705 \|
	\| 3.2106 \| 34.0 \| 84558 \| 3.5680 \|
	\| 3.185 \| 35.0 \| 87045 \| 3.5225 \|
	\| 3.1845 \| 36.0 \| 89532 \| 3.5237 \|
	\| 3.1581 \| 37.0 \| 92019 \| 3.5300 \|
	\| 3.1569 \| 38.0 \| 94506 \| 3.5081 \|
	\| 3.1222 \| 39.0 \| 96993 \| 3.5217 \|
	\| 3.1007 \| 40.0 \| 99480 \| 3.4810 \|
	\| 3.1094 \| 41.0 \| 101967 \| 3.5475 \|
	\| 3.1289 \| 42.0 \| 104454 \| 3.5126 \|
	\| 3.0841 \| 43.0 \| 106941 \| 3.5076 \|
	\| 3.0834 \| 44.0 \| 109428 \| 3.5101 \|
	\| 3.0862 \| 45.0 \| 111915 \| 3.4777 \|
	\| 3.0843 \| 46.0 \| 114402 \| 3.5116 \|
	\| 3.042 \| 47.0 \| 116889 \| 3.5031 \|
	\| 3.0424 \| 48.0 \| 119376 \| 3.4991 \|
	\| 3.0855 \| 49.0 \| 121863 \| 3.5203 \|
	\| 3.0325 \| 50.0 \| 124350 \| 3.5110 \|


	### Framework versions

	- Transformers 4.38.1
	- Pytorch 2.1.0+cu121
	- Datasets 2.17.1
	- Tokenizers 0.15.2