Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23 • 27
CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published May 16 • 45
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels Paper • 2405.07526 • Published May 13 • 18
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation Paper • 2405.07065 • Published May 11 • 16
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Paper • 2405.00233 • Published Apr 30 • 13
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models Paper • 2404.17672 • Published Apr 26 • 18
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving Paper • 2404.16771 • Published Apr 25 • 16
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25 • 55
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published Apr 25 • 35
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections Paper • 2404.16845 • Published Feb 14 • 6
Interactive3D: Create What You Want by Interactive 3D Generation Paper • 2404.16510 • Published Apr 25 • 18
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21