Edit model card

Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.

Some initial benchmark results:

Task Version Metric Value Stderr
hellaswag 0 acc 0.6621 ± 0.0047
acc_norm 0.8525 ± 0.0035
arc_challenge 0 acc 0.6348 ± 0.0141
acc_norm 0.6698 ± 0.0137
winogrande 0 acc 0.7861 ± 0.0115
gsm8k 0 acc 0.5694 ± 0.0136
Downloads last month
3,347
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Dataset used to train chargoddard/loyal-piano-m7-cdpo

Collection including chargoddard/loyal-piano-m7-cdpo