SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 16 days ago • 170
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • Updated 15 days ago • 622k • 1.32k
LLM2CLIP Collection LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 11 items • Updated 6 days ago • 60