Model save

4ded015 about 1 year ago

3.97 kB

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-v0.1
	tags:
	- generated_from_trainer
	model-index:
	- name: zephyr-7b-sft-lora-accum8-lr1e_5
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-sft-lora-accum8-lr1e_5

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.0319

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 50.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 2.078 \| 0.51 \| 6 \| 2.0305 \|
	\| 2.0322 \| 1.53 \| 13 \| 1.9523 \|
	\| 1.9301 \| 2.55 \| 20 \| 1.8814 \|
	\| 1.8757 \| 3.57 \| 27 \| 1.8209 \|
	\| 1.841 \| 4.51 \| 33 \| 1.7722 \|
	\| 1.7661 \| 5.53 \| 40 \| 1.7291 \|
	\| 1.731 \| 6.55 \| 47 \| 1.6904 \|
	\| 1.713 \| 7.57 \| 54 \| 1.6531 \|
	\| 1.6557 \| 8.51 \| 60 \| 1.6243 \|
	\| 1.6319 \| 9.53 \| 67 \| 1.5889 \|
	\| 1.5989 \| 10.55 \| 74 \| 1.5500 \|
	\| 1.5556 \| 11.57 \| 81 \| 1.5098 \|
	\| 1.5165 \| 12.51 \| 87 \| 1.4754 \|
	\| 1.4945 \| 13.53 \| 94 \| 1.4282 \|
	\| 1.4198 \| 14.55 \| 101 \| 1.3778 \|
	\| 1.3823 \| 15.57 \| 108 \| 1.3291 \|
	\| 1.3576 \| 16.51 \| 114 \| 1.2925 \|
	\| 1.2917 \| 17.53 \| 121 \| 1.2570 \|
	\| 1.2599 \| 18.55 \| 128 \| 1.2283 \|
	\| 1.2257 \| 19.57 \| 135 \| 1.2033 \|
	\| 1.2123 \| 20.51 \| 141 \| 1.1905 \|
	\| 1.1966 \| 21.53 \| 148 \| 1.1724 \|
	\| 1.1694 \| 22.55 \| 155 \| 1.1592 \|
	\| 1.1665 \| 23.57 \| 162 \| 1.1471 \|
	\| 1.1559 \| 24.51 \| 168 \| 1.1369 \|
	\| 1.1383 \| 25.53 \| 175 \| 1.1288 \|
	\| 1.141 \| 26.55 \| 182 \| 1.1200 \|
	\| 1.1334 \| 27.57 \| 189 \| 1.1138 \|
	\| 1.1193 \| 28.51 \| 195 \| 1.1079 \|
	\| 1.1079 \| 29.53 \| 202 \| 1.1016 \|
	\| 1.1188 \| 30.55 \| 209 \| 1.0961 \|
	\| 1.1006 \| 31.57 \| 216 \| 1.0916 \|
	\| 1.1016 \| 32.51 \| 222 \| 1.0851 \|
	\| 1.0801 \| 33.53 \| 229 \| 1.0783 \|
	\| 1.0846 \| 34.55 \| 236 \| 1.0758 \|
	\| 1.0828 \| 35.57 \| 243 \| 1.0725 \|
	\| 1.0758 \| 36.51 \| 249 \| 1.0694 \|
	\| 1.0749 \| 37.53 \| 256 \| 1.0646 \|
	\| 1.0626 \| 38.55 \| 263 \| 1.0627 \|
	\| 1.0575 \| 39.57 \| 270 \| 1.0592 \|
	\| 1.0583 \| 40.51 \| 276 \| 1.0555 \|
	\| 1.0548 \| 41.53 \| 283 \| 1.0518 \|
	\| 1.0495 \| 42.55 \| 290 \| 1.0468 \|
	\| 1.0449 \| 43.57 \| 297 \| 1.0469 \|
	\| 1.0527 \| 44.51 \| 303 \| 1.0420 \|
	\| 1.0411 \| 45.53 \| 310 \| 1.0415 \|
	\| 1.0325 \| 46.55 \| 317 \| 1.0384 \|
	\| 1.0404 \| 47.57 \| 324 \| 1.0353 \|
	\| 1.0326 \| 48.51 \| 330 \| 1.0337 \|
	\| 1.0262 \| 49.53 \| 337 \| 1.0317 \|


	### Framework versions

	- Transformers 4.35.0
	- Pytorch 2.1.0
	- Datasets 2.14.6
	- Tokenizers 0.14.1

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-v0.1
	tags:
	- generated_from_trainer
	model-index:
	- name: zephyr-7b-sft-lora-accum8-lr1e_5
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-sft-lora-accum8-lr1e_5

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.0319

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- num_epochs: 50.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 2.078 \| 0.51 \| 6 \| 2.0305 \|
	\| 2.0322 \| 1.53 \| 13 \| 1.9523 \|
	\| 1.9301 \| 2.55 \| 20 \| 1.8814 \|
	\| 1.8757 \| 3.57 \| 27 \| 1.8209 \|
	\| 1.841 \| 4.51 \| 33 \| 1.7722 \|
	\| 1.7661 \| 5.53 \| 40 \| 1.7291 \|
	\| 1.731 \| 6.55 \| 47 \| 1.6904 \|
	\| 1.713 \| 7.57 \| 54 \| 1.6531 \|
	\| 1.6557 \| 8.51 \| 60 \| 1.6243 \|
	\| 1.6319 \| 9.53 \| 67 \| 1.5889 \|
	\| 1.5989 \| 10.55 \| 74 \| 1.5500 \|
	\| 1.5556 \| 11.57 \| 81 \| 1.5098 \|
	\| 1.5165 \| 12.51 \| 87 \| 1.4754 \|
	\| 1.4945 \| 13.53 \| 94 \| 1.4282 \|
	\| 1.4198 \| 14.55 \| 101 \| 1.3778 \|
	\| 1.3823 \| 15.57 \| 108 \| 1.3291 \|
	\| 1.3576 \| 16.51 \| 114 \| 1.2925 \|
	\| 1.2917 \| 17.53 \| 121 \| 1.2570 \|
	\| 1.2599 \| 18.55 \| 128 \| 1.2283 \|
	\| 1.2257 \| 19.57 \| 135 \| 1.2033 \|
	\| 1.2123 \| 20.51 \| 141 \| 1.1905 \|
	\| 1.1966 \| 21.53 \| 148 \| 1.1724 \|
	\| 1.1694 \| 22.55 \| 155 \| 1.1592 \|
	\| 1.1665 \| 23.57 \| 162 \| 1.1471 \|
	\| 1.1559 \| 24.51 \| 168 \| 1.1369 \|
	\| 1.1383 \| 25.53 \| 175 \| 1.1288 \|
	\| 1.141 \| 26.55 \| 182 \| 1.1200 \|
	\| 1.1334 \| 27.57 \| 189 \| 1.1138 \|
	\| 1.1193 \| 28.51 \| 195 \| 1.1079 \|
	\| 1.1079 \| 29.53 \| 202 \| 1.1016 \|
	\| 1.1188 \| 30.55 \| 209 \| 1.0961 \|
	\| 1.1006 \| 31.57 \| 216 \| 1.0916 \|
	\| 1.1016 \| 32.51 \| 222 \| 1.0851 \|
	\| 1.0801 \| 33.53 \| 229 \| 1.0783 \|
	\| 1.0846 \| 34.55 \| 236 \| 1.0758 \|
	\| 1.0828 \| 35.57 \| 243 \| 1.0725 \|
	\| 1.0758 \| 36.51 \| 249 \| 1.0694 \|
	\| 1.0749 \| 37.53 \| 256 \| 1.0646 \|
	\| 1.0626 \| 38.55 \| 263 \| 1.0627 \|
	\| 1.0575 \| 39.57 \| 270 \| 1.0592 \|
	\| 1.0583 \| 40.51 \| 276 \| 1.0555 \|
	\| 1.0548 \| 41.53 \| 283 \| 1.0518 \|
	\| 1.0495 \| 42.55 \| 290 \| 1.0468 \|
	\| 1.0449 \| 43.57 \| 297 \| 1.0469 \|
	\| 1.0527 \| 44.51 \| 303 \| 1.0420 \|
	\| 1.0411 \| 45.53 \| 310 \| 1.0415 \|
	\| 1.0325 \| 46.55 \| 317 \| 1.0384 \|
	\| 1.0404 \| 47.57 \| 324 \| 1.0353 \|
	\| 1.0326 \| 48.51 \| 330 \| 1.0337 \|
	\| 1.0262 \| 49.53 \| 337 \| 1.0317 \|


	### Framework versions

	- Transformers 4.35.0
	- Pytorch 2.1.0
	- Datasets 2.14.6
	- Tokenizers 0.14.1