mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published 19 days ago • 36
Pandora: Towards General World Model with Natural Language Actions and Video States Paper • 2406.09455 • Published 24 days ago • 12
WPO: Enhancing RLHF with Weighted Preference Optimization Paper • 2406.11827 • Published 19 days ago • 13
In-Context Editing: Learning Knowledge from Self-Induced Distributions Paper • 2406.11194 • Published 19 days ago • 15
Deep Bayesian Active Learning for Preference Modeling in Large Language Models Paper • 2406.10023 • Published 22 days ago • 2
RVT-2: Learning Precise Manipulation from Few Demonstrations Paper • 2406.08545 • Published 24 days ago • 7
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published 25 days ago • 35
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published 28 days ago • 39
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning Paper • 2406.06469 • Published 26 days ago • 22
RePLan: Robotic Replanning with Perception and Language Models Paper • 2401.04157 • Published Jan 8 • 3
Generative Expressive Robot Behaviors using Large Language Models Paper • 2401.14673 • Published Jan 26 • 4
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Paper • 2406.02900 • Published Jun 5 • 10
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published Jun 5 • 7
MotionLLM: Understanding Human Behaviors from Human Motions and Videos Paper • 2405.20340 • Published May 30 • 19
Offline Regularised Reinforcement Learning for Large Language Models Alignment Paper • 2405.19107 • Published May 29 • 12
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Paper • 2405.19320 • Published May 29 • 9
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published May 20 • 33
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction Paper • 2405.10315 • Published May 16 • 9
Self-Play Preference Optimization for Language Model Alignment Paper • 2405.00675 • Published May 1 • 20
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment Paper • 2404.12318 • Published Apr 18 • 14
UniFL: Improve Stable Diffusion via Unified Feedback Learning Paper • 2404.05595 • Published Apr 8 • 22
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 58
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation Paper • 2404.03673 • Published Mar 25 • 14