Edit model card

kto-mix-14k-lf-response-llama3-f1_100_0.8-fg0.5-fgudw4.0-kto-fg

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the trl-lib/kto-mix-14k and the chaoweihuang/lf-response-llama3-f1_100_0.8-fg0.5 datasets. It achieves the following results on the evaluation set:

  • Loss: 0.4110
  • Rewards/chosen: 1.7360
  • Logps/chosen: -336.0412
  • Rewards/rejected: -2.2628
  • Logps/rejected: -406.1173
  • Rewards/margins: 3.9987
  • Kl: 0.0141
  • Fg Rewards/chosen Sum: -1.5560
  • Fg Logps/policy Chosen: -6.7332
  • Fg Logps/reference Chosen: -6.0419
  • Count/fg Chosen: 30.1832
  • Fg Rewards/rejected Sum: -0.9033
  • Fg Logps/policy Rejected: -8.6269
  • Fg Logps/reference Rejected: -7.5807
  • Count/fg Rejected: 6.9239
  • Fg Logps/policy Kl: -14.7946
  • Fg Logps/reference Kl: -11.4736
  • Fg Kl: nan
  • Fg Loss: 0.7625

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Rewards/rejected Logps/rejected Rewards/margins Kl Fg Rewards/chosen Sum Fg Logps/policy Chosen Fg Logps/reference Chosen Count/fg Chosen Fg Rewards/rejected Sum Fg Logps/policy Rejected Fg Logps/reference Rejected Count/fg Rejected Fg Logps/policy Kl Fg Logps/reference Kl Fg Kl Fg Loss
0.4478 0.4103 400 0.4325 1.3169 -340.2313 -1.7364 -400.8539 3.0534 0.0280 -1.3939 -6.6287 -6.0419 30.1832 -0.6768 -8.3632 -7.5807 6.9239 -13.6783 -11.4736 nan 0.7654
0.4043 0.8205 800 0.4110 1.7360 -336.0412 -2.2628 -406.1173 3.9987 0.0141 -1.5560 -6.7332 -6.0419 30.1832 -0.9033 -8.6269 -7.5807 6.9239 -14.7946 -11.4736 nan 0.7625

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for chaoweihuang/FactAlign-LLaMA-3-8B

Finetuned
(382)
this model

Datasets used to train chaoweihuang/FactAlign-LLaMA-3-8B

Collection including chaoweihuang/FactAlign-LLaMA-3-8B