73 1404 2158

taesiri PRO

taesiri

https://taesiri.ai/

AI & ML interests

AGI ... one linear layer at a time

Recent Activity

updated a dataset 1 minute ago

taesiri/SteamScreenshots-Bugs

updated a dataset 3 minutes ago

taesiri/SteamScreenshots-Bugs

updated a dataset 4 minutes ago

taesiri/SteamScreenshots-Bugs

View all activity

Organizations

taesiri's activity

upvoted 2 papers about 13 hours ago

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published about 24 hours ago • 22

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

Paper • 2503.10596 • Published 1 day ago • 15

upvoted 5 papers 1 day ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published 4 days ago • 89

When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning

Paper • 2503.07588 • Published 4 days ago • 6

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Paper • 2503.08525 • Published 3 days ago • 13

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published 2 days ago • 46

Motion Anything: Any to Motion Generation

Paper • 2503.06955 • Published 4 days ago • 15

upvoted a collection 2 days ago

Gemma 3 Release

Collection

9 items • Updated about 20 hours ago • 235

upvoted 2 papers 2 days ago

Referring to Any Person

Paper • 2503.08507 • Published 3 days ago • 5

AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models

Paper • 2503.08417 • Published 3 days ago • 6

upvoted 5 papers 3 days ago

Gemini Embedding: Generalizable Embeddings from Gemini

Paper • 2503.07891 • Published 4 days ago • 25

Video Action Differencing

Paper • 2503.07860 • Published 4 days ago • 28

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Paper • 2503.08625 • Published 3 days ago • 24

UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

Paper • 2503.08120 • Published 3 days ago • 27

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published 4 days ago • 31

upvoted an article 3 days ago

Article

Open R1: Update #3

and 9 others •

3 days ago

• 213

upvoted 2 papers 3 days ago

WritingBench: A Comprehensive Benchmark for Generative Writing

Paper • 2503.05244 • Published 7 days ago • 15

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published 9 days ago • 207

upvoted 2 papers 4 days ago

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

Paper • 2503.06749 • Published 5 days ago • 21

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

Paper • 2503.07365 • Published 4 days ago • 53