Edit model card

train_logs

This model is a fine-tuned version of tokyotech-llm/Swallow-7b-instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6776
  • Rewards/chosen: 0.1044
  • Rewards/rejected: 0.0678
  • Rewards/accuracies: 0.5983
  • Rewards/margins: 0.0365
  • Logps/rejected: -195.0584
  • Logps/chosen: -198.8751
  • Logits/rejected: -1.2872
  • Logits/chosen: -1.2718

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • training_steps: 300

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6922 0.0351 50 0.6910 -0.0173 -0.0222 0.5433 0.0050 -195.9592 -200.0917 -1.3115 -1.2970
0.6915 0.0702 100 0.6841 0.0935 0.0721 0.5900 0.0214 -195.0160 -198.9837 -1.2971 -1.2823
0.6819 0.1053 150 0.6792 0.1455 0.1116 0.5900 0.0339 -194.6210 -198.4638 -1.2865 -1.2708
0.6825 0.1404 200 0.6784 0.1161 0.0811 0.5933 0.0350 -194.9258 -198.7577 -1.2871 -1.2717
0.6791 0.1754 250 0.6769 0.1049 0.0670 0.6183 0.0378 -195.0665 -198.8701 -1.2885 -1.2730
0.6826 0.2105 300 0.6776 0.1044 0.0678 0.5983 0.0365 -195.0584 -198.8751 -1.2872 -1.2718

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
1
Unable to determine this model’s pipeline type. Check the docs .

Adapter for