AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 38
HuggingFaceTB/SmolLM2-1.7B-Instruct Text Generation • Updated about 1 month ago • 386k • • 590