Transformers
PyTorch
Inference Endpoints
Edit model card

mamba-2.8b-zephyr

This model is a fine-tuned version of xiuyul/mamba-2.8b-ultrachat on the HuggingFaceH4/ultrafeedback_binarized dataset trained using Direct Preference Optimization (DPO).

The base model, xiuyul/mamba-2.8b-ultrachat, was instruction-tuned from state-spaces/mamba-2.8b-slimpj on the HuggingFaceH4/ultrachat_200k dataset.

It achieves the following results on the evaluation set:

  • Loss: 0.4996
  • Rewards/chosen: -0.4523
  • Rewards/rejected: -1.6105
  • Rewards/accuracies: 0.7857
  • Rewards/margins: 1.1582
  • Logps/rejected: -290.1885
  • Logps/chosen: -359.0926
  • Logits/rejected: 23.0423
  • Logits/chosen: 23.1861

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6639 0.1 100 0.6593 0.1762 0.0957 0.6151 0.0805 -273.1268 -352.8086 23.5852 23.8356
0.5804 0.21 200 0.5836 0.0780 -0.3396 0.6508 0.4176 -277.4798 -353.7904 23.5872 23.8302
0.5815 0.31 300 0.5510 -0.1923 -0.7857 0.7421 0.5934 -281.9403 -356.4929 23.5224 23.7498
0.5526 0.41 400 0.5361 -0.1953 -0.8928 0.7341 0.6975 -283.0119 -356.5235 23.5033 23.7264
0.5225 0.52 500 0.5262 -0.1041 -0.8809 0.7540 0.7768 -282.8929 -355.6114 23.4578 23.6718
0.5577 0.62 600 0.5156 -0.1946 -1.0285 0.7659 0.8339 -284.3683 -356.5158 23.4466 23.6618
0.5515 0.72 700 0.5163 0.0648 -0.7650 0.7659 0.8298 -281.7334 -353.9220 23.4243 23.6343
0.5159 0.83 800 0.5113 -0.1400 -1.0595 0.7778 0.9195 -284.6783 -355.9698 23.4095 23.6179
0.5242 0.93 900 0.5089 -0.0383 -0.9148 0.7659 0.8766 -283.2318 -354.9529 23.4035 23.6145
0.4618 1.03 1000 0.5077 -0.1223 -1.0201 0.7778 0.8978 -284.2841 -355.7929 23.3805 23.5856
0.4484 1.14 1100 0.5019 -0.3311 -1.3299 0.7778 0.9989 -287.3827 -357.8807 23.3427 23.5381
0.4228 1.24 1200 0.5034 -0.0617 -1.0989 0.7619 1.0372 -285.0726 -355.1871 23.3191 23.5101
0.4306 1.34 1300 0.5032 -0.1585 -1.1849 0.7698 1.0264 -285.9320 -356.1549 23.2889 23.4787
0.4678 1.45 1400 0.5030 -0.2351 -1.1601 0.7817 0.9250 -285.6841 -356.9207 23.2661 23.4551
0.4317 1.55 1500 0.4997 -0.1401 -1.1458 0.7619 1.0057 -285.5417 -355.9716 23.2621 23.4524
0.4363 1.65 1600 0.5010 -0.3313 -1.3592 0.7738 1.0279 -287.6752 -357.8830 23.2320 23.4178
0.408 1.76 1700 0.4989 -0.2456 -1.3073 0.7778 1.0617 -287.1568 -357.0265 23.2135 23.3950
0.4076 1.86 1800 0.4996 -0.3904 -1.4365 0.7659 1.0461 -288.4482 -358.4738 23.1866 23.3617
0.4547 1.96 1900 0.5008 -0.2516 -1.2648 0.7857 1.0133 -286.7317 -357.0858 23.1605 23.3298
0.3469 2.07 2000 0.4977 -0.2868 -1.3916 0.7778 1.1048 -287.9999 -357.4383 23.1361 23.2990
0.3547 2.17 2100 0.4987 -0.4251 -1.5510 0.7619 1.1259 -289.5935 -358.8210 23.1142 23.2730
0.3468 2.27 2200 0.4979 -0.2674 -1.3945 0.7778 1.1271 -288.0285 -357.2443 23.0998 23.2561
0.3432 2.37 2300 0.5026 -0.3792 -1.4630 0.7738 1.0838 -288.7130 -358.3621 23.0726 23.2233
0.324 2.48 2400 0.5022 -0.4892 -1.6090 0.7698 1.1198 -290.1737 -359.4620 23.0543 23.2006
0.3556 2.58 2500 0.5010 -0.5270 -1.6576 0.7817 1.1306 -290.6595 -359.8404 23.0520 23.1981
0.3277 2.68 2600 0.4990 -0.5401 -1.6816 0.7778 1.1415 -290.8996 -359.9708 23.0449 23.1901
0.3262 2.79 2700 0.4993 -0.4952 -1.6410 0.7778 1.1458 -290.4932 -359.5220 23.0439 23.1878
0.3566 2.89 2800 0.4985 -0.4474 -1.5918 0.7778 1.1443 -290.0010 -359.0445 23.0433 23.1871
0.3386 2.99 2900 0.4983 -0.4598 -1.6040 0.7817 1.1442 -290.1235 -359.1679 23.0427 23.1866

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
4,902
Unable to determine this model’s pipeline type. Check the docs .

Finetuned from

Dataset used to train xiuyul/mamba-2.8b-zephyr