mistral-7b-sft-scientific-mcq / README.md

hkkvaerum

mnlp-nsoai/mistral-7b-scientific-mcq

29bcb04 verified 4 months ago

preview code

raw

history blame

No virus

4.1 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-Instruct-v0.2
	model-index:
	- name: mistral-7b-scientific-mcq
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mistral-7b-scientific-mcq

	This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7480

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.9911 \| 0.0581 \| 100 \| 0.8124 \|
	\| 0.879 \| 0.1162 \| 200 \| 0.7703 \|
	\| 0.9359 \| 0.1743 \| 300 \| 0.7576 \|
	\| 0.7608 \| 0.2325 \| 400 \| 0.7523 \|
	\| 0.8144 \| 0.2906 \| 500 \| 0.7469 \|
	\| 0.8655 \| 0.3487 \| 600 \| 0.7435 \|
	\| 0.6748 \| 0.4068 \| 700 \| 0.7390 \|
	\| 0.7004 \| 0.4649 \| 800 \| 0.7369 \|
	\| 0.7561 \| 0.5230 \| 900 \| 0.7351 \|
	\| 0.7053 \| 0.5811 \| 1000 \| 0.7317 \|
	\| 0.7122 \| 0.6393 \| 1100 \| 0.7294 \|
	\| 0.7431 \| 0.6974 \| 1200 \| 0.7279 \|
	\| 0.6102 \| 0.7555 \| 1300 \| 0.7255 \|
	\| 0.7041 \| 0.8136 \| 1400 \| 0.7244 \|
	\| 0.7339 \| 0.8717 \| 1500 \| 0.7227 \|
	\| 0.6648 \| 0.9298 \| 1600 \| 0.7207 \|
	\| 0.5682 \| 0.9879 \| 1700 \| 0.7192 \|
	\| 0.6745 \| 1.0461 \| 1800 \| 0.7242 \|
	\| 0.6003 \| 1.1042 \| 1900 \| 0.7258 \|
	\| 0.6755 \| 1.1623 \| 2000 \| 0.7273 \|
	\| 0.6815 \| 1.2204 \| 2100 \| 0.7265 \|
	\| 0.5531 \| 1.2785 \| 2200 \| 0.7253 \|
	\| 0.5 \| 1.3366 \| 2300 \| 0.7250 \|
	\| 0.666 \| 1.3947 \| 2400 \| 0.7236 \|
	\| 0.518 \| 1.4529 \| 2500 \| 0.7247 \|
	\| 0.6223 \| 1.5110 \| 2600 \| 0.7240 \|
	\| 0.565 \| 1.5691 \| 2700 \| 0.7234 \|
	\| 0.5541 \| 1.6272 \| 2800 \| 0.7220 \|
	\| 0.7622 \| 1.6853 \| 2900 \| 0.7220 \|
	\| 0.5212 \| 1.7434 \| 3000 \| 0.7223 \|
	\| 0.6089 \| 1.8015 \| 3100 \| 0.7205 \|
	\| 0.6908 \| 1.8597 \| 3200 \| 0.7210 \|
	\| 0.6138 \| 1.9178 \| 3300 \| 0.7204 \|
	\| 0.6425 \| 1.9759 \| 3400 \| 0.7199 \|
	\| 0.4918 \| 2.0340 \| 3500 \| 0.7416 \|
	\| 0.5432 \| 2.0921 \| 3600 \| 0.7468 \|
	\| 0.6497 \| 2.1502 \| 3700 \| 0.7463 \|
	\| 0.5068 \| 2.2083 \| 3800 \| 0.7448 \|
	\| 0.5502 \| 2.2665 \| 3900 \| 0.7475 \|
	\| 0.4795 \| 2.3246 \| 4000 \| 0.7482 \|
	\| 0.5718 \| 2.3827 \| 4100 \| 0.7486 \|
	\| 0.5154 \| 2.4408 \| 4200 \| 0.7474 \|
	\| 0.6959 \| 2.4989 \| 4300 \| 0.7479 \|
	\| 0.5848 \| 2.5570 \| 4400 \| 0.7473 \|
	\| 0.5662 \| 2.6151 \| 4500 \| 0.7479 \|
	\| 0.4357 \| 2.6733 \| 4600 \| 0.7482 \|
	\| 0.5318 \| 2.7314 \| 4700 \| 0.7476 \|
	\| 0.4631 \| 2.7895 \| 4800 \| 0.7480 \|
	\| 0.5852 \| 2.8476 \| 4900 \| 0.7481 \|
	\| 0.5633 \| 2.9057 \| 5000 \| 0.7480 \|
	\| 0.5831 \| 2.9638 \| 5100 \| 0.7480 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.2
	- Tokenizers 0.19.1

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-Instruct-v0.2
	model-index:
	- name: mistral-7b-scientific-mcq
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mistral-7b-scientific-mcq

	This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7480

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.9911 \| 0.0581 \| 100 \| 0.8124 \|
	\| 0.879 \| 0.1162 \| 200 \| 0.7703 \|
	\| 0.9359 \| 0.1743 \| 300 \| 0.7576 \|
	\| 0.7608 \| 0.2325 \| 400 \| 0.7523 \|
	\| 0.8144 \| 0.2906 \| 500 \| 0.7469 \|
	\| 0.8655 \| 0.3487 \| 600 \| 0.7435 \|
	\| 0.6748 \| 0.4068 \| 700 \| 0.7390 \|
	\| 0.7004 \| 0.4649 \| 800 \| 0.7369 \|
	\| 0.7561 \| 0.5230 \| 900 \| 0.7351 \|
	\| 0.7053 \| 0.5811 \| 1000 \| 0.7317 \|
	\| 0.7122 \| 0.6393 \| 1100 \| 0.7294 \|
	\| 0.7431 \| 0.6974 \| 1200 \| 0.7279 \|
	\| 0.6102 \| 0.7555 \| 1300 \| 0.7255 \|
	\| 0.7041 \| 0.8136 \| 1400 \| 0.7244 \|
	\| 0.7339 \| 0.8717 \| 1500 \| 0.7227 \|
	\| 0.6648 \| 0.9298 \| 1600 \| 0.7207 \|
	\| 0.5682 \| 0.9879 \| 1700 \| 0.7192 \|
	\| 0.6745 \| 1.0461 \| 1800 \| 0.7242 \|
	\| 0.6003 \| 1.1042 \| 1900 \| 0.7258 \|
	\| 0.6755 \| 1.1623 \| 2000 \| 0.7273 \|
	\| 0.6815 \| 1.2204 \| 2100 \| 0.7265 \|
	\| 0.5531 \| 1.2785 \| 2200 \| 0.7253 \|
	\| 0.5 \| 1.3366 \| 2300 \| 0.7250 \|
	\| 0.666 \| 1.3947 \| 2400 \| 0.7236 \|
	\| 0.518 \| 1.4529 \| 2500 \| 0.7247 \|
	\| 0.6223 \| 1.5110 \| 2600 \| 0.7240 \|
	\| 0.565 \| 1.5691 \| 2700 \| 0.7234 \|
	\| 0.5541 \| 1.6272 \| 2800 \| 0.7220 \|
	\| 0.7622 \| 1.6853 \| 2900 \| 0.7220 \|
	\| 0.5212 \| 1.7434 \| 3000 \| 0.7223 \|
	\| 0.6089 \| 1.8015 \| 3100 \| 0.7205 \|
	\| 0.6908 \| 1.8597 \| 3200 \| 0.7210 \|
	\| 0.6138 \| 1.9178 \| 3300 \| 0.7204 \|
	\| 0.6425 \| 1.9759 \| 3400 \| 0.7199 \|
	\| 0.4918 \| 2.0340 \| 3500 \| 0.7416 \|
	\| 0.5432 \| 2.0921 \| 3600 \| 0.7468 \|
	\| 0.6497 \| 2.1502 \| 3700 \| 0.7463 \|
	\| 0.5068 \| 2.2083 \| 3800 \| 0.7448 \|
	\| 0.5502 \| 2.2665 \| 3900 \| 0.7475 \|
	\| 0.4795 \| 2.3246 \| 4000 \| 0.7482 \|
	\| 0.5718 \| 2.3827 \| 4100 \| 0.7486 \|
	\| 0.5154 \| 2.4408 \| 4200 \| 0.7474 \|
	\| 0.6959 \| 2.4989 \| 4300 \| 0.7479 \|
	\| 0.5848 \| 2.5570 \| 4400 \| 0.7473 \|
	\| 0.5662 \| 2.6151 \| 4500 \| 0.7479 \|
	\| 0.4357 \| 2.6733 \| 4600 \| 0.7482 \|
	\| 0.5318 \| 2.7314 \| 4700 \| 0.7476 \|
	\| 0.4631 \| 2.7895 \| 4800 \| 0.7480 \|
	\| 0.5852 \| 2.8476 \| 4900 \| 0.7481 \|
	\| 0.5633 \| 2.9057 \| 5000 \| 0.7480 \|
	\| 0.5831 \| 2.9638 \| 5100 \| 0.7480 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.2
	- Tokenizers 0.19.1