Edit model card

Visualize in Weights & Biases

results

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0888
  • Rewards/chosen: -0.0906
  • Rewards/rejected: -0.1196
  • Rewards/accuracies: 0.8000
  • Rewards/margins: 0.0290
  • Logps/rejected: -1.1960
  • Logps/chosen: -0.9064
  • Logits/rejected: -1.3083
  • Logits/chosen: -1.0410
  • Nll Loss: 1.0376
  • Log Odds Ratio: -0.5127
  • Log Odds Chosen: 0.4766

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Nll Loss Log Odds Ratio Log Odds Chosen
1.0958 0.2020 25 1.4462 -0.1303 -0.1689 0.8000 0.0386 -1.6887 -1.3025 -1.1831 -0.9153 1.3976 -0.4854 0.5211
1.2563 0.4040 50 1.2116 -0.1057 -0.1387 0.8000 0.0330 -1.3872 -1.0575 -1.2714 -1.0083 1.1616 -0.5001 0.4913
1.3121 0.6061 75 1.1251 -0.0952 -0.1249 0.9000 0.0297 -1.2491 -0.9524 -1.3022 -1.0390 1.0740 -0.5109 0.4726
1.3689 0.8081 100 1.0888 -0.0906 -0.1196 0.8000 0.0290 -1.1960 -0.9064 -1.3083 -1.0410 1.0376 -0.5127 0.4766

Framework versions

  • PEFT 0.10.1.dev0
  • Transformers 4.41.0.dev0
  • Pytorch 2.2.2+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .