Yash Thube

thubZ9

https://thubzai.github.io/

AI & ML interests

Multimodal learning • CV • RL

Recent Activity

upvoted a paper about 7 hours ago

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

updated a collection 5 days ago

My reading list!

liked a model 11 days ago

deepseek-ai/DeepSeek-V3-0324

View all activity

Organizations

thubZ9's activity

upvoted a paper about 7 hours ago

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published 5 days ago • 107

upvoted a paper 14 days ago

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

Paper • 2503.13358 • Published 19 days ago • 91

upvoted a collection 21 days ago

C4AI Aya Vision

Collection

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated Mar 4 • 68

upvoted a collection 23 days ago

Gemma 3 Release

Collection

17 items • Updated 2 days ago • 310

upvoted an article 23 days ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

Mar 4

• 72

upvoted a paper 26 days ago

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published 29 days ago • 113

upvoted a paper 28 days ago

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published about 1 month ago • 91

upvoted 7 papers about 1 month ago

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 74

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 72

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 139

Continuous Diffusion Model for Language Modeling

Paper • 2502.11564 • Published Feb 17 • 52

upvoted 6 papers about 2 months ago

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Paper • 2502.13143 • Published Feb 18 • 29

Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 109

Region-Adaptive Sampling for Diffusion Transformers

Paper • 2502.10389 • Published Feb 14 • 52

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 147

mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data

Paper • 2502.08468 • Published Feb 12 • 13

Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Paper • 2502.03032 • Published Feb 5 • 59