End of training

01879d9 verified 4 months ago

No virus

4.72 kB

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-v0.1
	tags:
	- generated_from_trainer
	model-index:
	- name: sparse_mistral_7b_refined_web_50p_2024-04-13
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# sparse_mistral_7b_refined_web_50p_2024-04-13

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.2015

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 1
	- eval_batch_size: 4
	- seed: 0
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 32
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- training_steps: 1600

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 2.3391 \| 0.01 \| 25 \| 2.4196 \|
	\| 2.2711 \| 0.02 \| 50 \| 2.3577 \|
	\| 2.3054 \| 0.02 \| 75 \| 2.3158 \|
	\| 2.2795 \| 0.03 \| 100 \| 2.2966 \|
	\| 2.3175 \| 0.04 \| 125 \| 2.2846 \|
	\| 2.2388 \| 0.05 \| 150 \| 2.2766 \|
	\| 2.1679 \| 0.06 \| 175 \| 2.2705 \|
	\| 2.2996 \| 0.06 \| 200 \| 2.2678 \|
	\| 2.2788 \| 0.07 \| 225 \| 2.2647 \|
	\| 2.2448 \| 0.08 \| 250 \| 2.2637 \|
	\| 2.1837 \| 0.09 \| 275 \| 2.2624 \|
	\| 2.2089 \| 0.1 \| 300 \| 2.2621 \|
	\| 2.2686 \| 0.1 \| 325 \| 2.2601 \|
	\| 2.2254 \| 0.11 \| 350 \| 2.2593 \|
	\| 2.162 \| 0.12 \| 375 \| 2.2590 \|
	\| 2.2687 \| 0.13 \| 400 \| 2.2563 \|
	\| 2.2595 \| 0.14 \| 425 \| 2.2571 \|
	\| 2.186 \| 0.14 \| 450 \| 2.2564 \|
	\| 2.2689 \| 0.15 \| 475 \| 2.2580 \|
	\| 2.2472 \| 0.16 \| 500 \| 2.2554 \|
	\| 2.2005 \| 0.17 \| 525 \| 2.2553 \|
	\| 2.1983 \| 0.18 \| 550 \| 2.2552 \|
	\| 2.2388 \| 0.18 \| 575 \| 2.2547 \|
	\| 2.1443 \| 0.19 \| 600 \| 2.2555 \|
	\| 2.2198 \| 0.2 \| 625 \| 2.2534 \|
	\| 2.3008 \| 0.21 \| 650 \| 2.2536 \|
	\| 2.179 \| 0.22 \| 675 \| 2.2521 \|
	\| 2.2069 \| 0.22 \| 700 \| 2.2531 \|
	\| 2.1819 \| 0.23 \| 725 \| 2.2526 \|
	\| 2.1218 \| 0.24 \| 750 \| 2.2536 \|
	\| 2.1845 \| 0.25 \| 775 \| 2.2515 \|
	\| 2.2167 \| 0.26 \| 800 \| 2.2510 \|
	\| 2.2252 \| 0.26 \| 825 \| 2.2520 \|
	\| 2.1664 \| 0.27 \| 850 \| 2.2519 \|
	\| 2.1853 \| 0.28 \| 875 \| 2.2530 \|
	\| 2.1499 \| 0.29 \| 900 \| 2.2513 \|
	\| 2.2763 \| 0.3 \| 925 \| 2.2517 \|
	\| 2.2528 \| 0.3 \| 950 \| 2.2518 \|
	\| 2.2505 \| 0.31 \| 975 \| 2.2500 \|
	\| 2.1683 \| 0.32 \| 1000 \| 2.2502 \|
	\| 2.2177 \| 0.33 \| 1025 \| 2.2501 \|
	\| 2.238 \| 0.34 \| 1050 \| 2.2516 \|
	\| 2.193 \| 0.34 \| 1075 \| 2.2507 \|
	\| 2.2025 \| 0.35 \| 1100 \| 2.2502 \|
	\| 2.0944 \| 0.36 \| 1125 \| 2.2512 \|
	\| 2.2272 \| 0.37 \| 1150 \| 2.2508 \|
	\| 2.2264 \| 0.38 \| 1175 \| 2.2500 \|
	\| 2.1837 \| 0.38 \| 1200 \| 2.2507 \|
	\| 2.1444 \| 0.39 \| 1225 \| 2.2489 \|
	\| 2.2464 \| 0.4 \| 1250 \| 2.2499 \|
	\| 2.1388 \| 0.41 \| 1275 \| 2.2508 \|
	\| 2.193 \| 0.42 \| 1300 \| 2.2492 \|
	\| 2.2376 \| 0.42 \| 1325 \| 2.2506 \|
	\| 2.2212 \| 0.43 \| 1350 \| 2.2478 \|
	\| 2.2002 \| 0.44 \| 1375 \| 2.2488 \|
	\| 2.2729 \| 0.45 \| 1400 \| 2.2484 \|
	\| 2.2329 \| 0.46 \| 1425 \| 2.2473 \|
	\| 2.1919 \| 0.46 \| 1450 \| 2.2481 \|
	\| 2.2102 \| 0.47 \| 1475 \| 2.2475 \|
	\| 2.1466 \| 0.48 \| 1500 \| 2.2473 \|
	\| 2.1818 \| 0.49 \| 1525 \| 2.2462 \|
	\| 2.2549 \| 0.5 \| 1550 \| 2.2470 \|
	\| 2.2137 \| 0.5 \| 1575 \| 2.2449 \|
	\| 2.2276 \| 0.51 \| 1600 \| 2.2481 \|


	### Framework versions

	- Transformers 4.36.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.15.0
	- Tokenizers 0.15.0

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-v0.1
	tags:
	- generated_from_trainer
	model-index:
	- name: sparse_mistral_7b_refined_web_50p_2024-04-13
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# sparse_mistral_7b_refined_web_50p_2024-04-13

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.2015

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 1
	- eval_batch_size: 4
	- seed: 0
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 32
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- training_steps: 1600

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 2.3391 \| 0.01 \| 25 \| 2.4196 \|
	\| 2.2711 \| 0.02 \| 50 \| 2.3577 \|
	\| 2.3054 \| 0.02 \| 75 \| 2.3158 \|
	\| 2.2795 \| 0.03 \| 100 \| 2.2966 \|
	\| 2.3175 \| 0.04 \| 125 \| 2.2846 \|
	\| 2.2388 \| 0.05 \| 150 \| 2.2766 \|
	\| 2.1679 \| 0.06 \| 175 \| 2.2705 \|
	\| 2.2996 \| 0.06 \| 200 \| 2.2678 \|
	\| 2.2788 \| 0.07 \| 225 \| 2.2647 \|
	\| 2.2448 \| 0.08 \| 250 \| 2.2637 \|
	\| 2.1837 \| 0.09 \| 275 \| 2.2624 \|
	\| 2.2089 \| 0.1 \| 300 \| 2.2621 \|
	\| 2.2686 \| 0.1 \| 325 \| 2.2601 \|
	\| 2.2254 \| 0.11 \| 350 \| 2.2593 \|
	\| 2.162 \| 0.12 \| 375 \| 2.2590 \|
	\| 2.2687 \| 0.13 \| 400 \| 2.2563 \|
	\| 2.2595 \| 0.14 \| 425 \| 2.2571 \|
	\| 2.186 \| 0.14 \| 450 \| 2.2564 \|
	\| 2.2689 \| 0.15 \| 475 \| 2.2580 \|
	\| 2.2472 \| 0.16 \| 500 \| 2.2554 \|
	\| 2.2005 \| 0.17 \| 525 \| 2.2553 \|
	\| 2.1983 \| 0.18 \| 550 \| 2.2552 \|
	\| 2.2388 \| 0.18 \| 575 \| 2.2547 \|
	\| 2.1443 \| 0.19 \| 600 \| 2.2555 \|
	\| 2.2198 \| 0.2 \| 625 \| 2.2534 \|
	\| 2.3008 \| 0.21 \| 650 \| 2.2536 \|
	\| 2.179 \| 0.22 \| 675 \| 2.2521 \|
	\| 2.2069 \| 0.22 \| 700 \| 2.2531 \|
	\| 2.1819 \| 0.23 \| 725 \| 2.2526 \|
	\| 2.1218 \| 0.24 \| 750 \| 2.2536 \|
	\| 2.1845 \| 0.25 \| 775 \| 2.2515 \|
	\| 2.2167 \| 0.26 \| 800 \| 2.2510 \|
	\| 2.2252 \| 0.26 \| 825 \| 2.2520 \|
	\| 2.1664 \| 0.27 \| 850 \| 2.2519 \|
	\| 2.1853 \| 0.28 \| 875 \| 2.2530 \|
	\| 2.1499 \| 0.29 \| 900 \| 2.2513 \|
	\| 2.2763 \| 0.3 \| 925 \| 2.2517 \|
	\| 2.2528 \| 0.3 \| 950 \| 2.2518 \|
	\| 2.2505 \| 0.31 \| 975 \| 2.2500 \|
	\| 2.1683 \| 0.32 \| 1000 \| 2.2502 \|
	\| 2.2177 \| 0.33 \| 1025 \| 2.2501 \|
	\| 2.238 \| 0.34 \| 1050 \| 2.2516 \|
	\| 2.193 \| 0.34 \| 1075 \| 2.2507 \|
	\| 2.2025 \| 0.35 \| 1100 \| 2.2502 \|
	\| 2.0944 \| 0.36 \| 1125 \| 2.2512 \|
	\| 2.2272 \| 0.37 \| 1150 \| 2.2508 \|
	\| 2.2264 \| 0.38 \| 1175 \| 2.2500 \|
	\| 2.1837 \| 0.38 \| 1200 \| 2.2507 \|
	\| 2.1444 \| 0.39 \| 1225 \| 2.2489 \|
	\| 2.2464 \| 0.4 \| 1250 \| 2.2499 \|
	\| 2.1388 \| 0.41 \| 1275 \| 2.2508 \|
	\| 2.193 \| 0.42 \| 1300 \| 2.2492 \|
	\| 2.2376 \| 0.42 \| 1325 \| 2.2506 \|
	\| 2.2212 \| 0.43 \| 1350 \| 2.2478 \|
	\| 2.2002 \| 0.44 \| 1375 \| 2.2488 \|
	\| 2.2729 \| 0.45 \| 1400 \| 2.2484 \|
	\| 2.2329 \| 0.46 \| 1425 \| 2.2473 \|
	\| 2.1919 \| 0.46 \| 1450 \| 2.2481 \|
	\| 2.2102 \| 0.47 \| 1475 \| 2.2475 \|
	\| 2.1466 \| 0.48 \| 1500 \| 2.2473 \|
	\| 2.1818 \| 0.49 \| 1525 \| 2.2462 \|
	\| 2.2549 \| 0.5 \| 1550 \| 2.2470 \|
	\| 2.2137 \| 0.5 \| 1575 \| 2.2449 \|
	\| 2.2276 \| 0.51 \| 1600 \| 2.2481 \|


	### Framework versions

	- Transformers 4.36.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.15.0
	- Tokenizers 0.15.0