ghost613
/

llama8_on_korean_summary

Generated from Trainer

Model card Files Files and versions Community

llama8_on_korean_summary / README.md

ghost613's picture

llama8_on_korean_summary

4882582 verified 3 months ago

|

history blame contribute delete

No virus

3.42 kB

	---
	license: other
	library_name: peft
	tags:
	- generated_from_trainer
	base_model: beomi/Llama-3-Open-Ko-8B-Instruct-preview
	model-index:
	- name: llama8_on_korean_summary
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama8_on_korean_summary

	This model is a fine-tuned version of [beomi/Llama-3-Open-Ko-8B-Instruct-preview](https://huggingface.co/beomi/Llama-3-Open-Ko-8B-Instruct-preview) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.8536

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- gradient_accumulation_steps: 5
	- total_train_batch_size: 10
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 50
	- training_steps: 760
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.7767 \| 0.26 \| 20 \| 1.5096 \|
	\| 1.2382 \| 0.53 \| 40 \| 1.0459 \|
	\| 0.9915 \| 0.79 \| 60 \| 0.9451 \|
	\| 0.9126 \| 1.05 \| 80 \| 0.8893 \|
	\| 0.8501 \| 1.32 \| 100 \| 0.8515 \|
	\| 0.8113 \| 1.58 \| 120 \| 0.8192 \|
	\| 0.8019 \| 1.84 \| 140 \| 0.7939 \|
	\| 0.7239 \| 2.11 \| 160 \| 0.7795 \|
	\| 0.6621 \| 2.37 \| 180 \| 0.7594 \|
	\| 0.6457 \| 2.63 \| 200 \| 0.7433 \|
	\| 0.6417 \| 2.89 \| 220 \| 0.7281 \|
	\| 0.5929 \| 3.16 \| 240 \| 0.7305 \|
	\| 0.5245 \| 3.42 \| 260 \| 0.7242 \|
	\| 0.5291 \| 3.68 \| 280 \| 0.7154 \|
	\| 0.528 \| 3.95 \| 300 \| 0.7109 \|
	\| 0.4696 \| 4.21 \| 320 \| 0.7257 \|
	\| 0.4474 \| 4.47 \| 340 \| 0.7251 \|
	\| 0.4572 \| 4.74 \| 360 \| 0.7252 \|
	\| 0.4391 \| 5.0 \| 380 \| 0.7202 \|
	\| 0.3794 \| 5.26 \| 400 \| 0.7462 \|
	\| 0.3771 \| 5.53 \| 420 \| 0.7568 \|
	\| 0.3754 \| 5.79 \| 440 \| 0.7453 \|
	\| 0.3739 \| 6.05 \| 460 \| 0.7597 \|
	\| 0.3179 \| 6.32 \| 480 \| 0.7803 \|
	\| 0.3328 \| 6.58 \| 500 \| 0.7699 \|
	\| 0.3259 \| 6.84 \| 520 \| 0.7710 \|
	\| 0.3014 \| 7.11 \| 540 \| 0.8083 \|
	\| 0.2759 \| 7.37 \| 560 \| 0.8017 \|
	\| 0.2758 \| 7.63 \| 580 \| 0.7954 \|
	\| 0.2798 \| 7.89 \| 600 \| 0.8003 \|
	\| 0.2545 \| 8.16 \| 620 \| 0.8325 \|
	\| 0.2451 \| 8.42 \| 640 \| 0.8282 \|
	\| 0.2355 \| 8.68 \| 660 \| 0.8318 \|
	\| 0.2382 \| 8.95 \| 680 \| 0.8300 \|
	\| 0.2256 \| 9.21 \| 700 \| 0.8544 \|
	\| 0.212 \| 9.47 \| 720 \| 0.8532 \|
	\| 0.2108 \| 9.74 \| 740 \| 0.8529 \|
	\| 0.2125 \| 10.0 \| 760 \| 0.8536 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.17.0
	- Tokenizers 0.15.0