Edit model card

simpo-lora-2

This model is a fine-tuned version of hatakeyama-llm-team/with_halcination_little_codes_ck5200 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2390
  • Rewards/chosen: -4.5674
  • Rewards/rejected: -5.4724
  • Rewards/accuracies: 0.6150
  • Rewards/margins: 0.9050
  • Logps/rejected: -2.1889
  • Logps/chosen: -1.8270
  • Logits/rejected: 0.5316
  • Logits/chosen: 0.4876

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 64
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.2463 0.1600 100 1.2850 -4.5732 -5.3937 0.5984 0.8205 -2.1575 -1.8293 0.6303 0.5626
1.3045 0.3200 200 1.2645 -4.5737 -5.4306 0.6014 0.8569 -2.1722 -1.8295 0.5868 0.5287
1.301 0.4800 300 1.2457 -4.5684 -5.4599 0.6111 0.8915 -2.1840 -1.8274 0.5387 0.4911
1.4689 0.6400 400 1.2386 -4.5673 -5.4725 0.6131 0.9052 -2.1890 -1.8269 0.5307 0.4859
1.3989 0.8000 500 1.2386 -4.5680 -5.4717 0.6121 0.9036 -2.1887 -1.8272 0.5340 0.4899
1.1781 0.9600 600 1.2390 -4.5674 -5.4724 0.6150 0.9050 -2.1889 -1.8270 0.5316 0.4876

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.1
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for misdelivery/tk-simpo-test-beta2.5-lr1e-6