An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 26 days ago • 53 • 20
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published 20 days ago • 28 • 10