Merve Noyan

merve

AI & ML interests

VLMs, vision & co

Recent Activity

Articles

Organizations

merve's activity

posted an update about 11 hours ago
view post
Post
434
The authors of ColPali trained a retrieval model based on SmolVLM 🀠 vidore/colsmolvlm-alpha
TLDR;

- ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks

- ColSmolVLM is more memory efficient than ColQwen2 πŸ’—
updated a Space about 14 hours ago
posted an update 1 day ago
view post
Post
2103
Small yet mighty! πŸ’«

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🀠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO πŸ’
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO πŸ’—
New activity in HuggingFaceTB/SmolVLM-Instruct 1 day ago

Revert chat template

#4 opened 1 day ago by merve
New activity in HuggingFaceTB/SmolVLM 2 days ago

Upload rococo.jpg

1
#2 opened 2 days ago by merve

Upload rococo.jpg

#1 opened 2 days ago by merve
New activity in HuggingFaceTB/SmolVLM-Base 2 days ago

Add eos token

#2 opened 2 days ago by merve
New activity in HuggingFaceTB/SmolVLM-Base 3 days ago

Added chat_template

1
#1 opened 3 days ago by merve
New activity in HuggingFaceTB/SmolVLM-Base 3 days ago

Added chat_template

1
#1 opened 3 days ago by merve
New activity in HuggingFaceTB/SmolVLM-Instruct 5 days ago

Misc improvements

1
#1 opened 5 days ago by merve
posted an update 6 days ago
view post
Post
2470
What a week! A recap for everything you missed ❄️
merve/nov-22-releases-673fbbcfc1c97c4f411def07
Multimodal ✨
> Mistral AI
released Pixtral 124B, a gigantic open vision language model
> Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU
> OpenGVLab released MMPR: a new multimodal reasoning dataset
> Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings
> Apple released new SotA vision encoders AIMv2

LLMs πŸ¦™
> AllenAI dropped a huge release of models, datasets and scripts for TΓΌlu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR
> Jina has released embeddings-v3: new multilingual embeddings with longer context
> Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning
> Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs

Image Generation πŸ–ΌοΈ
> Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations

Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them πŸ“š
$ pip install observers
  • 3 replies
Β·
posted an update 6 days ago
view post
Post
1423
Apple released AIMv2 🍏 a family of state-of-the-art open-set vision encoders
apple/aimv2-6720fe1558d94c7805f7688c
> like CLIP, but add a decoder and train on autoregression 🀯
> 19 open models come in 300M, 600M, 1.2B, 2.7B with resolutions of 224, 336, 448
> Load and use with πŸ€— transformers