SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe Paper • 2410.05248 • Published Oct 7 • 8
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe Paper • 2410.05248 • Published Oct 7 • 8
WPO Collection Models and datasets in paper "WPO: Enhancing RLHF with Weighted Preference Optimization". • 11 items • Updated Aug 22 • 5
WPO Collection Models and datasets in paper "WPO: Enhancing RLHF with Weighted Preference Optimization". • 11 items • Updated Aug 22 • 5
WPO Collection Models and datasets in paper "WPO: Enhancing RLHF with Weighted Preference Optimization". • 11 items • Updated Aug 22 • 5
Standard-format-preference-dataset Collection We collect the open-source datasets and process them into the standard format. • 14 items • Updated May 8 • 23