Text Generation
Transformers
Safetensors
mistral
generated_from_trainer
conversational
Inference Endpoints
text-generation-inference
Edit model card

zephyr-7b-spin-iter3-v0

A model matching the results of SPIN with very little data (30x less), carefully curated by the amazing Data Is Better Together community

Built with Distilabel

This model is a fine-tuned version of argilla/zephyr-7b-spin-iter2-v0 on the argilla/10k_prompts_SPIN_iter3_zephyr_top and the argilla/10k_prompts_SPIN_iter2_zephyr_top dataset.

Check this repo for full reproducible code using the original SPIN implementation and distilabel.

If you want to contribute to high quality datasets like this, contribute to the DIBT prompt collective initiative.

MT-Bench results

Model 1st Turn Score 2nd Turn Score Average Score SPIN paper Score
zephyr-7b-sft-full 6.6625 6.0250 6.34375 5.94
zephyr-7b-spin-iter0-v0 6.64375 6.1750 6.409375 6.46
zephyr-7b-spin-iter1-v0 6.90625 6.3000 6.603125 6.65
zephyr-7b-spin-iter2-v0 7.1375 6.3125 6.725000 6.78
zephyr-7b-spin-iter3-v0 7.09375 6.4500 6.771875 -

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Rewards/real Rewards/generated Rewards/accuracies Rewards/margins Logps/generated Logps/real Logits/generated Logits/real
0.2928 0.49 25 0.3951 -2.6212 -20.3268 0.9062 17.7056 -700.5638 -278.0876 -2.8098 -2.8090
0.1487 0.97 50 0.1319 -2.9077 -29.1459 0.9375 26.2382 -702.3276 -278.1449 -2.8218 -2.8066
0.006 1.46 75 0.1269 -2.6037 -29.1519 0.9583 26.5482 -702.3289 -278.0841 -2.8175 -2.8037
0.0086 1.94 100 0.1099 -2.9181 -29.6970 0.9271 26.7789 -702.4378 -278.1470 -2.8177 -2.8051

Framework versions

  • Transformers 4.37.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
21
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Finetuned from

Datasets used to train argilla/zephyr-7b-spin-iter3-v0

Collections including argilla/zephyr-7b-spin-iter3-v0