WebDreamer Collection Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents • 6 items • Updated 4 days ago • 4
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ Paper • 2405.15306 • Published May 24, 2024 • 7
DeTikZify Collection Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ • 12 items • Updated 24 days ago • 22
AudioX: Diffusion Transformer for Anything-to-Audio Generation Paper • 2503.10522 • Published about 1 month ago • 22
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published 30 days ago • 92
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published Mar 3 • 83
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 3 items • Updated 17 days ago • 85