CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published 4 days ago • 29
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation Paper • 2405.07065 • Published 9 days ago • 14
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 27
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 28
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 40
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 46
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published Apr 10 • 22
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 21
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models Paper • 2404.03118 • Published Apr 3 • 19
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper • 2403.20331 • Published Mar 29 • 14
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion Paper • 2403.17237 • Published Mar 25 • 8
Garment3DGen: 3D Garment Stylization and Texture Generation Paper • 2403.18816 • Published Mar 27 • 19
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers Paper • 2403.12943 • Published Mar 19 • 13
Gemma: Open Models Based on Gemini Research and Technology Paper • 2403.08295 • Published Mar 13 • 43
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 92
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 172
RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches Paper • 2403.02709 • Published Mar 5 • 6
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 40
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 566
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation Paper • 2402.14795 • Published Feb 22 • 4
DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation Paper • 2402.11929 • Published Feb 19 • 9
Learning to Learn Faster from Human Feedback with Language Model Predictive Control Paper • 2402.11450 • Published Feb 18 • 20
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots Paper • 2402.10329 • Published Feb 15 • 13
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing Paper • 2402.10294 • Published Feb 15 • 19
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization Paper • 2402.09812 • Published Feb 15 • 11
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation Paper • 2402.10210 • Published Feb 15 • 28
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models Paper • 2402.08714 • Published Feb 13 • 10
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects Paper • 2402.09052 • Published Feb 14 • 16
Animated Stickers: Bringing Stickers to Life with Video Diffusion Paper • 2402.06088 • Published Feb 8 • 9
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset Paper • 2402.05937 • Published Feb 8 • 8
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue Paper • 2402.05930 • Published Feb 8 • 35
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions Paper • 2402.03040 • Published Feb 5 • 16
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models Paper • 2402.01118 • Published Feb 2 • 28
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback Paper • 2402.01391 • Published Feb 2 • 41
ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields Paper • 2401.17895 • Published Jan 31 • 15
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 17
Adaptive Mobile Manipulation for Articulated Objects In the Open World Paper • 2401.14403 • Published Jan 25 • 8
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion Paper • 2401.14066 • Published Jan 25 • 7
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 82
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents Paper • 2401.12963 • Published Jan 23 • 11