Edit model card

Visualize in Weights & Biases

pythia-410m-deduped

This model is a fine-tuned version of EleutherAI/pythia-410m-deduped on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7801
  • Original Losses: 1.7969
  • Weight: 1.0
  • Abs Diff: 0.4453
  • Rewards/chosen: -4.875
  • Rewards/rejected: -5.0625
  • Rewards/accuracies: 0.4405
  • Rewards/margins: 0.2002
  • Logps/rejected: -2.0312
  • Logps/chosen: -1.9453
  • Logits/rejected: 5.6875
  • Logits/chosen: 5.7188
  • All Logps 1: -656.8973
  • All Logps 1 Values: -656.8973
  • All Logps 2: 434.6329
  • All Logps 2 Values: 434.6329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 36
  • eval_batch_size: 36
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 2304
  • total_eval_batch_size: 288
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Original Losses Weight Abs Diff Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen All Logps 1 All Logps 1 Values All Logps 2 All Logps 2 Values
1.9612 0.0385 1 1.7894 1.8125 1.0 0.4492 -4.9062 -5.0938 0.4405 0.1895 -2.0312 -1.9688 5.6875 5.7188 -657.8339 -657.8338 434.6329 434.6329
1.9612 0.0769 2 1.7887 1.8125 1.0 0.4531 -4.9062 -5.0938 0.4444 0.1895 -2.0312 -1.9609 5.6875 5.6875 -657.5561 -657.5560 434.6329 434.6329
1.9612 0.1154 3 1.7887 1.8203 1.0 0.4512 -4.9375 -5.125 0.4444 0.1885 -2.0469 -1.9688 5.6875 5.7188 -657.2574 -657.2574 434.6329 434.6329
1.9612 0.1538 4 1.7891 1.8125 1.0 0.4512 -4.9375 -5.0938 0.4365 0.1807 -2.0469 -1.9688 5.6875 5.7188 -657.5514 -657.5513 434.6329 434.6329
1.868 0.1923 5 1.7881 1.8125 1.0 0.4473 -4.9062 -5.0938 0.4325 0.1816 -2.0312 -1.9609 5.6875 5.7188 -656.7651 -656.7651 434.6329 434.6329
1.868 0.2308 6 1.7911 1.8203 1.0 0.4512 -4.9375 -5.0938 0.4524 0.1670 -2.0469 -1.9766 5.6875 5.7188 -658.1024 -658.1024 434.6329 434.6329
1.868 0.2692 7 1.7870 1.8125 1.0 0.4512 -4.9062 -5.0938 0.4484 0.1846 -2.0312 -1.9609 5.6875 5.7188 -657.3370 -657.3370 434.6329 434.6329
1.868 0.3077 8 1.7835 1.8203 1.0 0.4473 -4.9062 -5.0938 0.4405 0.1729 -2.0312 -1.9688 5.6562 5.6875 -657.3589 -657.3589 434.6329 434.6329
1.868 0.3462 9 1.7860 1.8125 1.0 0.4453 -4.9062 -5.0938 0.4405 0.1855 -2.0312 -1.9609 5.6875 5.7188 -657.4703 -657.4702 434.6329 434.6329
1.886 0.3846 10 1.7897 1.8125 1.0 0.4453 -4.9062 -5.0938 0.4325 0.1855 -2.0312 -1.9609 5.6875 5.7188 -657.2245 -657.2244 434.6329 434.6329
1.886 0.4231 11 1.7852 1.8125 1.0 0.4473 -4.9062 -5.0938 0.4484 0.1807 -2.0312 -1.9609 5.6875 5.7188 -657.7448 -657.7448 434.6329 434.6329
1.886 0.4615 12 1.7827 1.8203 1.0 0.4492 -4.9062 -5.0938 0.4603 0.1797 -2.0312 -1.9609 5.6875 5.7188 -657.9037 -657.9037 434.6329 434.6329
1.886 0.5 13 1.7844 1.8203 1.0 0.4512 -4.9062 -5.0625 0.4365 0.1689 -2.0312 -1.9609 5.6875 5.7188 -657.7488 -657.7488 434.6329 434.6329
1.886 0.5385 14 1.7828 1.8047 1.0 0.4395 -4.875 -5.0625 0.4405 0.1885 -2.0312 -1.9531 5.6875 5.7188 -657.5707 -657.5707 434.6329 434.6329
1.8572 0.5769 15 1.7852 1.8125 1.0 0.4453 -4.9062 -5.0625 0.4365 0.1768 -2.0312 -1.9609 5.6875 5.7188 -657.2753 -657.2753 434.6329 434.6329
1.8572 0.6154 16 1.7798 1.8125 1.0 0.4414 -4.9062 -5.0625 0.4246 0.1709 -2.0156 -1.9531 5.6875 5.7188 -657.5228 -657.5228 434.6329 434.6329
1.8572 0.6538 17 1.7797 1.8047 1.0 0.4414 -4.875 -5.0625 0.4484 0.1816 -2.0312 -1.9531 5.6875 5.7188 -657.8073 -657.8073 434.6329 434.6329
1.8572 0.6923 18 1.7830 1.8125 1.0 0.4375 -4.9062 -5.0625 0.4405 0.1631 -2.0312 -1.9609 5.6875 5.7188 -657.4370 -657.4370 434.6329 434.6329
1.8572 0.7308 19 1.7831 1.8047 1.0 0.4414 -4.875 -5.0625 0.4524 0.1787 -2.0312 -1.9609 5.6875 5.7188 -657.5411 -657.5412 434.6329 434.6329
1.8374 0.7692 20 1.7812 1.8047 1.0 0.4512 -4.9062 -5.0938 0.4524 0.1973 -2.0312 -1.9531 5.6875 5.7188 -657.5830 -657.5831 434.6329 434.6329
1.8374 0.8077 21 1.7850 1.8125 1.0 0.4414 -4.875 -5.0625 0.4444 0.1719 -2.0312 -1.9609 5.6875 5.7188 -657.6910 -657.6910 434.6329 434.6329
1.8374 0.8462 22 1.7851 1.8047 1.0 0.4434 -4.9062 -5.0625 0.4405 0.1836 -2.0312 -1.9531 5.6875 5.7188 -657.1679 -657.1679 434.6329 434.6329
1.8374 0.8846 23 1.7782 1.8047 1.0 0.4375 -4.9062 -5.0625 0.4365 0.1748 -2.0312 -1.9609 5.6875 5.7188 -658.0194 -658.0193 434.6329 434.6329
1.8374 0.9231 24 1.7800 1.8047 1.0 0.4375 -4.9062 -5.0625 0.4524 0.1709 -2.0312 -1.9609 5.6875 5.7188 -657.4482 -657.4482 434.6329 434.6329
1.8714 0.9615 25 1.7788 1.7969 1.0 0.4375 -4.875 -5.0625 0.4325 0.1816 -2.0312 -1.9531 5.6875 5.7188 -657.4512 -657.4511 434.6329 434.6329
1.8714 1.0 26 1.7801 1.7969 1.0 0.4453 -4.875 -5.0625 0.4405 0.2002 -2.0312 -1.9453 5.6875 5.7188 -656.8973 -656.8973 434.6329 434.6329

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
405M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for RAY2L/pythia-410m-deduped-SimPOW-0

Finetuned
(80)
this model

Dataset used to train RAY2L/pythia-410m-deduped-SimPOW-0