52 Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset · 3 authors 4
23 StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control · 4 authors 1
20 BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences · 9 authors 2
14 Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring · 6 authors 3
11 Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding · 10 authors 1