Aligning Multimodal LLM with Human Preference: A Survey Paper • 2503.14504 • Published 7 days ago • 20
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 15 days ago • 40
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published Feb 14 • 32