1464 178 345

Merve Noyan

merve

https://github.com/merveenoyan/smol-vision

AI & ML interests

VLMs, vision & co

Recent Activity

updated a collection 8 days ago

March 28 Releases

updated a collection 8 days ago

March 28 Releases

updated a collection 8 days ago

March 28 Releases

View all activity

Organizations

merve's activity

updated a collection 8 days ago

March 28 Releases

Collection

11 items • Updated 8 days ago • 3

updated a dataset 8 days ago

merve/vlm_test_images

Viewer • Updated 8 days ago • 9 • 602

New activity in google/shieldgemma-2-4b-it 9 days ago

how do i interpret the results

#2 opened 16 days ago by

cuiyi0326

Add notebook and explanations on output

#4 opened 9 days ago by

merve

updated a Space 9 days ago

ShieldGemma2 VLM

📉

Demo for ShieldGemma 2, multimodal safety model

published a Space 9 days ago

ShieldGemma2 VLM

📉

Demo for ShieldGemma 2, multimodal safety model

New activity in google/shieldgemma-2-4b-it 10 days ago

Model Fintune

#3 opened 11 days ago by

BITDDD

ImportError: cannot import name 'ShieldGemmaForImageClassification' from 'transformers'

#1 opened 23 days ago by

feabries

liked a model 11 days ago

Qwen/Qwen2.5-VL-32B-Instruct

Image-Text-to-Text • Updated 10 days ago • 196k • 315

posted an update 14 days ago

Post

3581

So many open releases at Hugging Face past week 🤯 recapping all here ⤵️ merve/march-21-releases-67dbe10e185f199e656140ae

👀 Multimodal
> Mistral AI released a 24B vision LM, both base and instruction FT versions, sota 🔥 (OS)
> with IBM we released SmolDocling, a sota 256M document parser with Apache 2.0 license (OS)
> SpatialLM is a new vision LM that outputs 3D bounding boxes, comes with 0.5B (QwenVL based) and 1B (Llama based) variants
> SkyWork released SkyWork-R1V-38B, new vision reasoning model (OS)

💬 LLMs
> NVIDIA released new Nemotron models in 49B and 8B with their post-training dataset
> LG released EXAONE, new reasoning models in 2.4B, 7.8B and 32B
> Dataset: Glaive AI released a new reasoning dataset of 22M+ examples
> Dataset: NVIDIA released new helpfulness dataset HelpSteer3
> Dataset: OpenManusRL is a new agent dataset based on ReAct framework (OS)
> Open-R1 team released OlympicCoder, new competitive coder model in 7B and 32B
> Dataset: GeneralThought-430K is a new reasoning dataset (OS)

🖼️ Image Generation/Computer Vision
> Roboflow released RF-DETR, new real-time sota object detector (OS) 🔥
> YOLOE is a new real-time zero-shot object detector with text and visual prompts 🥹
> Stability AI released Stable Virtual Camera, a new novel view synthesis model
> Tencent released Hunyuan3D-2mini, new small and fast 3D asset generation model
> ByteDance released InfiniteYou, new realistic photo generation model
> StarVector is a new 8B model that generates svg from images
> FlexWorld is a new model that expands 3D views (OS)

🎤 Audio
> Sesame released CSM-1B new speech generation model (OS)

🤖 Robotics
> NVIDIA released GR00T, new robotics model for generalized reasoning and skills, along with the dataset

*OS ones have Apache 2.0 or MIT license