starchat2-15b-v0.1 / README.md
lewtun's picture
lewtun HF staff
Add HuggingFaceH4/starcoder2-15b-dpo-v4.0 checkpoint
3dcc536 verified
|
raw
history blame
No virus
4.23 kB
metadata
base_model: HuggingFaceH4/starcoder2-15b-ift
tags:
  - alignment-handbook
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
  - HuggingFaceH4/orca_dpo_pairs
model-index:
  - name: starcoder2-15b-dpo-v4.0
    results: []

starcoder2-15b-dpo-v4.0

This model is a fine-tuned version of HuggingFaceH4/starcoder2-15b-ift on the HuggingFaceH4/ultrafeedback_binarized and the HuggingFaceH4/orca_dpo_pairs datasets. It achieves the following results on the evaluation set:

  • Loss: 0.4347
  • Rewards/chosen: -0.9461
  • Rewards/rejected: -2.7745
  • Rewards/accuracies: 0.7658
  • Rewards/margins: 1.8284
  • Logps/rejected: -322.1934
  • Logps/chosen: -316.1898
  • Logits/rejected: -2.3817
  • Logits/chosen: -2.3005

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.717 0.17 100 0.6006 -0.0924 -0.2899 0.6329 0.1975 -272.5022 -299.1165 -2.5313 -2.4191
0.6273 0.35 200 0.5160 -0.3994 -0.9461 0.6930 0.5467 -285.6261 -305.2568 -2.5281 -2.4278
0.5538 0.52 300 0.4781 -0.6589 -1.5892 0.7247 0.9302 -298.4870 -310.4470 -2.4996 -2.4110
0.5056 0.7 400 0.4594 -0.8283 -2.1332 0.7437 1.3050 -309.3687 -313.8344 -2.4472 -2.3644
0.4983 0.87 500 0.4512 -0.7758 -2.2806 0.7468 1.5049 -312.3167 -312.7843 -2.4223 -2.3404
0.4662 1.04 600 0.4431 -0.7839 -2.4016 0.7658 1.6177 -314.7355 -312.9465 -2.4049 -2.3215
0.4411 1.22 700 0.4415 -1.0090 -2.7582 0.7690 1.7492 -321.8679 -317.4481 -2.3840 -2.3016
0.471 1.39 800 0.4368 -0.9617 -2.7445 0.7690 1.7828 -321.5930 -316.5019 -2.3809 -2.2991
0.4485 1.57 900 0.4351 -0.9490 -2.7594 0.7722 1.8103 -321.8916 -316.2497 -2.3815 -2.3004
0.4411 1.74 1000 0.4348 -0.9293 -2.7469 0.7658 1.8176 -321.6409 -315.8547 -2.3823 -2.3011
0.4499 1.92 1100 0.4348 -0.9482 -2.7767 0.7658 1.8285 -322.2369 -316.2320 -2.3828 -2.3012

Framework versions

  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.1