Edit model card

OrpoLlama-3-8B-Instruct

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0981
  • Rewards/chosen: -0.0745
  • Rewards/rejected: -0.0794
  • Rewards/accuracies: 0.4000
  • Rewards/margins: 0.0049
  • Logps/rejected: -0.7938
  • Logps/chosen: -0.7449
  • Logits/rejected: 0.0540
  • Logits/chosen: -0.1876
  • Nll Loss: 1.0259
  • Log Odds Ratio: -0.7218
  • Log Odds Chosen: 0.0125

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Nll Loss Log Odds Ratio Log Odds Chosen
1.4815 0.2020 25 1.2773 -0.0942 -0.1188 0.6000 0.0246 -1.1878 -0.9419 0.0316 -0.2920 1.2167 -0.6066 0.2976
0.9343 0.4040 50 1.1677 -0.0819 -0.0918 0.4000 0.0099 -0.9183 -0.8189 -0.0239 -0.2369 1.1001 -0.6763 0.1038
1.2029 0.6061 75 1.1140 -0.0759 -0.0805 0.4000 0.0046 -0.8050 -0.7587 -0.0008 -0.2166 1.0421 -0.7191 0.0131
1.25 0.8081 100 1.0981 -0.0745 -0.0794 0.4000 0.0049 -0.7938 -0.7449 0.0540 -0.1876 1.0259 -0.7218 0.0125

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.3.1+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
20
Unable to determine this model’s pipeline type. Check the docs .

Adapter for