--- license: cc-by-nc-4.0 datasets: - HuggingFaceH4/ultrafeedback_binarized language: - en --- Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending. Some initial benchmark results: | Task |Version| Metric |Value | |Stderr| |---------|------:|--------|-----:|---|-----:| |hellaswag| 0|acc |0.6621|± |0.0047| | | |acc_norm|0.8525|± |0.0035| |arc_challenge| 0|acc |0.6348|± |0.0141| | | |acc_norm|0.6698|± |0.0137| |winogrande| 0|acc |0.7861|± |0.0115| |gsm8k| 0|acc |0.5694|± |0.0136|