-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 12 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 87 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 31
Collections
Discover the best community collections!
Collections including paper arxiv:2409.06666
-
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 54 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 33 -
How Far Can We Go with Practical Function-Level Program Repair?
Paper • 2404.12833 • Published • 6 -
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Paper • 2404.18796 • Published • 68
-
parler-tts/parler_tts_mini_v0.1
Text-to-Speech • Updated • 22.5k • 346 -
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Paper • 2405.08317 • Published • 9 -
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Paper • 2405.18669 • Published • 11 -
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 30
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 16 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47