Models trained in paper "WPO: Enhancing RLHF with Weighted Preference Optimization".
Wenxuan Zhou
wzhouad
AI & ML interests
None yet
Organizations
Collections
1
models
6
wzhouad/Llama3-Instruct-8B-WPO-HB-v2
Text Generation
•
Updated
•
9
•
1
wzhouad/zephyr-7B-WPO-HB
Text Generation
•
Updated
•
2
wzhouad/zephyr-7B-WPO-FP
Text Generation
•
Updated
•
2
wzhouad/Llama3-Instruct-8B-WPO-HB
Text Generation
•
Updated
•
2
wzhouad/Llama3-Instruct-8B-WPO-FP
Text Generation
•
Updated
•
6
wzhouad/prix-lm
Text Generation
•
Updated
•
3
datasets
None public yet