29 246 28

Orr Zohar PRO

orrzohar

https://orrzohar.github.io

AI & ML interests

Large Multi-Modal Models, Foundation Models, Video Understanding

Recent Activity

upvoted a paper about 4 hours ago

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

upvoted a paper about 4 hours ago

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

upvoted a paper about 4 hours ago

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

View all activity

Organizations

orrzohar's activity

upvoted 3 papers about 4 hours ago

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

Paper • 2503.10582 • Published about 17 hours ago • 9

Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models

Paper • 2503.09669 • Published 1 day ago • 25

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

Paper • 2503.10613 • Published about 16 hours ago • 32

upvoted 2 papers 1 day ago

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary

Paper • 2503.09402 • Published 2 days ago • 6

Video Action Differencing

Paper • 2503.07860 • Published 4 days ago • 28

upvoted 3 papers 7 days ago

upvoted an article 10 days ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

10 days ago

• 65

upvoted a collection 10 days ago

C4AI Aya Vision

Collection

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 10 days ago • 63

upvoted 2 papers 17 days ago

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Paper • 2502.16033 • Published 20 days ago • 16

Audio-FLAN: A Preliminary Release

Paper • 2502.16584 • Published 19 days ago • 34

upvoted a collection 20 days ago

SigLIP2

Collection

36 items • Updated 2 days ago • 62

upvoted a paper 21 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 22 days ago • 129

upvoted a collection 22 days ago

SmolVLM2 📺 Smallest video LM ever 🤏🏻

Collection

11 items • Updated 17 days ago • 59

upvoted an article 22 days ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

22 days ago

• 205

upvoted a paper 23 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 26 days ago • 142

upvoted a paper 29 days ago

Distillation Scaling Laws

Paper • 2502.08606 • Published 30 days ago • 46

upvoted 2 papers about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 203

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5 • 57