magnum-4b-KTO-test

This model is a fine-tuned version of anthracite-org/magnum-v2-4b on the combined_new_22k.json dataset. It achieves the following results on the evaluation set:

Loss: 0.5030
Rewards/chosen: 0.0007
Logps/chosen: -11.2857
Rewards/rejected: -0.0006
Logps/rejected: -10.6547
Rewards/margins: 0.0013
Kl: 0.0009

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 48
total_train_batch_size: 768
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.01
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Logps/chosen	Rewards/rejected	Logps/rejected	Rewards/margins	Kl
0.5042	0.2788	16	0.5038	0.0004	-11.2884	-0.0004	-10.6529	0.0008	0.0022
0.5037	0.5575	32	0.5033	0.0006	-11.2865	-0.0008	-10.6565	0.0014	0.0013
0.5035	0.8363	48	0.5041	0.0003	-11.2899	-0.0006	-10.6546	0.0008	0.0016
0.5037	1.1151	64	0.5035	0.0005	-11.2872	-0.0005	-10.6540	0.0011	0.0017
0.5036	1.3938	80	0.5036	0.0005	-11.2874	-0.0005	-10.6535	0.0010	0.0010
0.5032	1.6726	96	0.5035	0.0006	-11.2867	-0.0005	-10.6541	0.0011	0.0012
0.5036	1.9514	112	0.5037	0.0006	-11.2869	-0.0006	-10.6546	0.0011	0.0009

Framework versions

PEFT 0.12.0
Transformers 4.45.0.dev0
Pytorch 2.2.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Edens-Gate
/

KTO-4B-lora

magnum-4b-KTO-test

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Edens-Gate/KTO-4B-lora

Evaluation results