Sorour
/

cls_alldata_phi3_v1

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

cls_alldata_phi3_v1 / README.md

Sorour's picture

Model save

fa323db verified about 1 month ago

|

raw history blame contribute delete

No virus

3.3 kB

	---
	license: mit
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: microsoft/Phi-3-mini-4k-instruct
	datasets:
	- generator
	model-index:
	- name: cls_alldata_phi3_v1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# cls_alldata_phi3_v1

	This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the generator dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4956

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 2
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant
	- lr_scheduler_warmup_ratio: 0.03
	- num_epochs: 2
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.7566 \| 0.0559 \| 20 \| 0.7643 \|
	\| 0.6863 \| 0.1117 \| 40 \| 0.7089 \|
	\| 0.6538 \| 0.1676 \| 60 \| 0.6706 \|
	\| 0.6261 \| 0.2235 \| 80 \| 0.6499 \|
	\| 0.6402 \| 0.2793 \| 100 \| 0.6321 \|
	\| 0.594 \| 0.3352 \| 120 \| 0.6226 \|
	\| 0.5956 \| 0.3911 \| 140 \| 0.6121 \|
	\| 0.5743 \| 0.4469 \| 160 \| 0.6016 \|
	\| 0.5494 \| 0.5028 \| 180 \| 0.5903 \|
	\| 0.5861 \| 0.5587 \| 200 \| 0.5887 \|
	\| 0.5431 \| 0.6145 \| 220 \| 0.5801 \|
	\| 0.5404 \| 0.6704 \| 240 \| 0.5746 \|
	\| 0.5401 \| 0.7263 \| 260 \| 0.5695 \|
	\| 0.5363 \| 0.7821 \| 280 \| 0.5644 \|
	\| 0.5534 \| 0.8380 \| 300 \| 0.5608 \|
	\| 0.5936 \| 0.8939 \| 320 \| 0.5552 \|
	\| 0.5139 \| 0.9497 \| 340 \| 0.5496 \|
	\| 0.5096 \| 1.0056 \| 360 \| 0.5468 \|
	\| 0.4891 \| 1.0615 \| 380 \| 0.5468 \|
	\| 0.4524 \| 1.1173 \| 400 \| 0.5433 \|
	\| 0.4568 \| 1.1732 \| 420 \| 0.5397 \|
	\| 0.4462 \| 1.2291 \| 440 \| 0.5374 \|
	\| 0.4605 \| 1.2849 \| 460 \| 0.5337 \|
	\| 0.4469 \| 1.3408 \| 480 \| 0.5328 \|
	\| 0.458 \| 1.3966 \| 500 \| 0.5313 \|
	\| 0.4378 \| 1.4525 \| 520 \| 0.5250 \|
	\| 0.4654 \| 1.5084 \| 540 \| 0.5232 \|
	\| 0.4563 \| 1.5642 \| 560 \| 0.5200 \|
	\| 0.4664 \| 1.6201 \| 580 \| 0.5155 \|
	\| 0.4308 \| 1.6760 \| 600 \| 0.5128 \|
	\| 0.443 \| 1.7318 \| 620 \| 0.5082 \|
	\| 0.4508 \| 1.7877 \| 640 \| 0.5070 \|
	\| 0.4511 \| 1.8436 \| 660 \| 0.4999 \|
	\| 0.4467 \| 1.8994 \| 680 \| 0.4996 \|
	\| 0.4723 \| 1.9553 \| 700 \| 0.4956 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.1
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1