HelpSteer2: Open-source dataset for training top-performing reward models Paper • 2406.08673 • Published 2 days ago • 11
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts Paper • 2406.09162 • Published 1 day ago • 8
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation Paper • 2406.09305 • Published 1 day ago • 4
LRM-Zero: Training Large Reconstruction Models with Synthesized Data Paper • 2406.09371 • Published 1 day ago • 3
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published 3 days ago • 15
Interpreting the Weight Space of Customized Diffusion Models Paper • 2406.09413 • Published 1 day ago • 15
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Paper • 2406.09416 • Published 1 day ago • 22
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published 1 day ago • 33
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models Paper • 2406.08487 • Published 3 days ago • 10
Discovering Preference Optimization Algorithms with and for Large Language Models Paper • 2406.08414 • Published 3 days ago • 12
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation Paper • 2406.07686 • Published 3 days ago • 13
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation Paper • 2406.08392 • Published 3 days ago • 15
Hierarchical Patch Diffusion Models for High-Resolution Video Generation Paper • 2406.07792 • Published 3 days ago • 13
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published 4 days ago • 26
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published 3 days ago • 32
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published 5 days ago • 31
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published 7 days ago • 35
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language Paper • 2406.05629 • Published 6 days ago • 6
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound Paper • 2406.06612 • Published 8 days ago • 11
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising Paper • 2406.06911 • Published 4 days ago • 10
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B Paper • 2406.07394 • Published 4 days ago • 14
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models Paper • 2406.06563 • Published 12 days ago • 16
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published 9 days ago • 29
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 4 days ago • 42
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning Paper • 2406.06469 • Published 5 days ago • 19
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis Paper • 2406.06216 • Published 5 days ago • 11
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization Paper • 2406.05981 • Published 5 days ago • 10
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Paper • 2406.05370 • Published 7 days ago • 11
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference Paper • 2406.06424 • Published 5 days ago • 8
MLCM: Multistep Consistency Distillation of Latent Diffusion Model Paper • 2406.05768 • Published 6 days ago • 8
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published 5 days ago • 53
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published 8 days ago • 36
GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published 8 days ago • 17
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published 8 days ago • 22
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning Paper • 2406.04520 • Published 8 days ago • 9
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 9 days ago • 62
VideoTetris: Towards Compositional Text-to-Video Generation Paper • 2406.04277 • Published 9 days ago • 19
On Architectural Compression of Text-to-Image Diffusion Models Paper • 2305.15798 • Published May 25, 2023 • 3
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published 9 days ago • 35
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published 9 days ago • 25
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration Paper • 2406.01014 • Published 12 days ago • 28
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion Paper • 2406.03184 • Published 10 days ago • 16
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published 11 days ago • 32