-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 125 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85
Collections
Discover the best community collections!
Collections including paper arxiv:2407.04842
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 21 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 79 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Paper • 2407.14507 • Published • 46 -
New Desiderata for Direct Preference Optimization
Paper • 2407.09072 • Published • 9 -
Self-Recognition in Language Models
Paper • 2407.06946 • Published • 24 -
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Paper • 2407.04842 • Published • 52
-
Video-to-Audio Generation with Hidden Alignment
Paper • 2407.07464 • Published • 16 -
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Paper • 2407.04842 • Published • 52 -
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Paper • 2407.04051 • Published • 35
-
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Paper • 2407.04842 • Published • 52 -
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Paper • 2407.00468 • Published • 34 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 155
-
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Paper • 2407.04842 • Published • 52 -
yichaodu/DiffusionDPO-alignment-gemini-1.5
Text-to-Image • Updated • 11 -
yichaodu/DiffusionDPO-safety-internvl-1.5
Text-to-Image • Updated • 6 -
yichaodu/DiffusionDPO-alignment-gpt-4o
Text-to-Image • Updated • 12
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 26 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 12 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 45 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 28
-
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
Paper • 2404.05717 • Published • 24 -
ByteEdit: Boost, Comply and Accelerate Generative Image Editing
Paper • 2404.04860 • Published • 24 -
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Paper • 2407.04842 • Published • 52