ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 58
Chronos Models Collection Chronos: Pretrained (language) models for time series forecasting based on the T5 architecture. • 6 items • Updated Mar 18 • 25
datasets-SPIN Collection Generated synthetic data used to finetune SPIN. • 8 items • Updated Feb 9 • 10
A General Theoretical Paradigm to Understand Learning from Human Preferences Paper • 2310.12036 • Published Oct 18, 2023 • 11
NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval Paper • 2310.14282 • Published Oct 22, 2023 • 5
Diffusion Model Alignment Using Direct Preference Optimization Paper • 2311.12908 • Published Nov 21, 2023 • 47
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module Paper • 2311.05556 • Published Nov 9, 2023 • 75
Reward models on the hub Collection UNMAINTAINED: See RewardBench... A place to collect reward models, an often not released artifact of RLHF. • 18 items • Updated Apr 13 • 24