Umitcan Sahin

ucsahin

AI & ML interests

Visual Language Models, Large Language Models, Vision Transformers

Recent Activity

upvoted a collection 4 days ago
DeepSeek-V3
upvoted a collection 4 days ago
DeepSeek-VL2
reacted to singhsidhukuldeep's post with 🔥 10 days ago
Exciting News in AI: JinaAI Releases JINA-CLIP-v2! The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal: 🚀 Technical Highlights: - Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder - Supports 89 languages with 8,192 token context length - Processes images up to 512×512 pixels with 14×14 patch size - Implements FlashAttention2 for text and xFormers for vision processing - Uses Matryoshka Representation Learning for efficient vector storage ⚡️ Under The Hood: - Multi-stage training process with progressive resolution scaling (224→384→512) - Contrastive learning using InfoNCE loss in both directions - Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs - Incorporates specialized datasets for document understanding, scientific graphs, and infographics - Uses hard negative mining with 7 negatives per positive sample 📊 Performance: - Outperforms previous models on visual document retrieval (52.65% nDCG@5) - Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark - Strong multilingual performance across 30 languages - Maintains performance even with 75% dimension reduction (256D vs 1024D) 🎯 Key Innovation: The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems! Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
View all activity

Organizations

None yet

ucsahin's activity

New activity in ucsahin/TR-VLM-DPO-Dataset about 1 month ago
New activity in ucsahin/Turkish-VLM-Mix-Benchmark 3 months ago
New activity in Metin/WikiRAG-TR 5 months ago
New activity in ucsahin/TR-Extractive-QA-5K 6 months ago
New activity in ucsahin/Florence-2-large-TableDetection 6 months ago

TypeError need help

2
#1 opened 6 months ago by
ss22s