Interview_L3_1000rate_1e5_SFT_SFT

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0253

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss
1.3904	0.0376	50	1.2452
1.1582	0.0752	100	0.9397
0.9079	0.1129	150	0.6367
0.3786	0.1505	200	0.4351
0.258	0.1881	250	0.3067
0.2163	0.2257	300	0.2114
0.1031	0.2634	350	0.1570
0.0911	0.3010	400	0.1205
0.0739	0.3386	450	0.0901
0.0503	0.3762	500	0.0713
0.0713	0.4138	550	0.0598
0.066	0.4515	600	0.0457
0.0181	0.4891	650	0.0403
0.015	0.5267	700	0.0358
0.0172	0.5643	750	0.0301
0.0314	0.6020	800	0.0267
0.0279	0.6396	850	0.0259
0.0133	0.6772	900	0.0254
0.0122	0.7148	950	0.0253
0.0126	0.7524	1000	0.0253

Framework versions

Transformers 4.40.2
Pytorch 2.0.0+cu117
Datasets 2.19.1
Tokenizers 0.19.1

tsavage68
/

Interview_L3_1000rate_1e5_SFT_SFT

Interview_L3_1000rate_1e5_SFT_SFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Evaluation results

Interview_L3_1000rate_1e5_SFT_SFT

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from meta-llama/Meta-Llama-3-8B-Instruct

Evaluation results

Finetuned from