QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation Paper • 2502.05178 • Published Feb 7 • 10
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 32