6 30 91

Umitcan Sahin PRO

ucsahin

AI & ML interests

Visual Language Models, Large Language Models, Vision Transformers

Recent Activity

liked a dataset about 1 month ago

atasoglu/turkish-function-calling-2k

new activity about 2 months ago

ucsahin/TR-Visual-Docs:[bot] Conversion to Parquet

published a dataset about 2 months ago

ucsahin/TR-Visual-Docs

View all activity

Organizations

None yet

ucsahin's activity

liked a dataset about 1 month ago

atasoglu/turkish-function-calling-2k

Viewer • Updated Mar 14 • 2.07k • 70 • 4

New activity in ucsahin/TR-Visual-Docs about 2 months ago

[bot] Conversion to Parquet

#1 opened 4 months ago by

parquet-converter

published a dataset about 2 months ago

ucsahin/TR-Visual-Docs

Viewer • Updated Jan 14 • 4 • 18

liked 3 models about 2 months ago

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • Updated 13 days ago • 618k • 1.31k

99eren99/ColBERT-ModernBERT-base-Turkish-uncased

Metin/LLaMA-3-8B-GRPO-Finance-Math-TR

Text Generation • Updated Feb 24 • 38 • 6

reacted to merve's post with 🚀 2 months ago

Post

6471

Google just released PaliGemma 2 Mix: new versatile instruction vision language models 🔥

> Three new models: 3B, 10B, 28B with res 224, 448 💙
> Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🤯

Read more https://huggingface.co/blog/paligemma2mix
Try the demo google/paligemma2-10b-mix
All models are here google/paligemma-2-mix-67ac6a251aaf3ee73679dcc4

reacted to davanstrien's post with 👍 3 months ago

Post

1982

Updated the ColPali Query Generator Space davanstrien/ColPali-Query-Generator to use Qwen/Qwen2.5-VL-7B-Instruct.

Given an input image, it generates several queries along with explanations to justify them. This approach can generate synthetic data for fine-tuning ColPali models.

liked 3 models 3 months ago

reacted to merve's post with 🔥 3 months ago

Post

5354

Oof, what a week! 🥵 So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images