aftonposten-6b-align-scan

This model is a fine-tuned version of data/ap-gpt-j-6b-sft-qlora-04-08 on the hugodk-sch/aftonposten_title_prefs dataset. It achieves the following results on the evaluation set:

Loss: 0.5772
Rewards/chosen: 0.0684
Rewards/rejected: 0.0623
Rewards/accuracies: 0.5307
Rewards/margins: 0.0061
Logps/rejected: -37.4276
Logps/chosen: -33.9368
Logits/rejected: -2.2420
Logits/chosen: -2.2469

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4

Training results

Training Loss	Epoch	Step	Logits/chosen	Logits/rejected	Logps/chosen	Logps/rejected	Validation Loss	Rewards/accuracies	Rewards/chosen	Rewards/margins	Rewards/rejected
0.4711	0.26	100	-2.2401	-2.2352	-34.0113	-37.4979	0.5755	0.5195	0.0163	0.0032	0.0131
0.5061	0.52	200	-2.2385	-2.2337	-34.0500	-37.5455	0.5877	0.4992	-0.0108	0.0094	-0.0202
0.3371	0.78	300	-2.2371	-2.2322	-34.0344	-37.5353	0.5843	0.5278	0.0001	0.0132	-0.0131
0.4001	1.04	400	0.6350	-0.0073	0.0033	0.4838	-0.0106	-37.5120	-34.0450	-2.2353	-2.2402
0.3401	1.3	500	0.6238	-0.0135	-0.0193	0.5141	0.0058	-37.5443	-34.0539	-2.2353	-2.2402
0.433	1.56	600	0.6143	0.0129	0.0108	0.5245	0.0021	-37.5011	-34.0161	-2.2421	-2.2469
0.3298	1.82	700	0.5790	0.0633	0.0499	0.5195	0.0134	-37.4453	-33.9442	-2.2401	-2.2450
0.14	2.08	800	0.5904	0.0586	0.0544	0.5162	0.0041	-37.4389	-33.9509	-2.2423	-2.2472
0.2302	2.34	900	0.5758	0.0851	0.0740	0.5544	0.0111	-37.4109	-33.9130	-2.2448	-2.2497
0.2296	2.6	1000	0.5750	0.0631	0.0552	0.5075	0.0080	-37.4378	-33.9444	-2.2440	-2.2489
0.2798	2.86	1100	0.5483	0.0729	0.0545	0.5428	0.0184	-37.4387	-33.9303	-2.2419	-2.2468
0.1195	3.12	1200	0.5759	0.0672	0.0613	0.5137	0.0059	-37.4291	-33.9386	-2.2424	-2.2473
0.1371	3.38	1300	0.5592	0.0733	0.0574	0.5494	0.0159	-37.4346	-33.9299	-2.2434	-2.2483
0.0993	3.64	1400	0.6130	0.0546	0.0598	0.4871	-0.0053	-37.4311	-33.9566	-2.2422	-2.2471
0.18	3.9	1500	0.5566	0.0778	0.0602	0.5050	0.0176	-37.4306	-33.9234	-2.2423	-2.2472

Framework versions

PEFT 0.10.0
Transformers 4.39.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.1

hugodk-sch
/

aftonposten-6b-align-scan

aftonposten-6b-align-scan

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hugodk-sch/aftonposten-6b-align-scan

Evaluation results