Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
merveΒ 
posted an update 9 days ago
Post
1633
Microsoft released LLM2CLIP: a CLIP model with longer context window for complex text inputs 🀯
All models with Apache 2.0 license here microsoft/llm2clip-672323a266173cfa40b32d4c

TLDR; they replaced CLIP's text encoder with various LLMs fine-tuned on captioning, better top-k accuracy on retrieval.
This will enable better image-text retrieval, better zero-shot image classification, better vision language models πŸ”₯
Read the paper to learn more: LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation (2411.04997)
deleted
This comment has been hidden
In this post