2 27 22

Gullal Singh Cheema

gullalc

gullalc

AI & ML interests

Multimodality, Vision and Language, Cross-modal relations, Video Understanding

Recent Activity

upvoted a paper about 11 hours ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

upvoted a paper about 11 hours ago

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

upvoted a paper 3 days ago

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

View all activity

Organizations

None yet

gullalc's activity

upvoted 2 papers about 11 hours ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 1 day ago • 167

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published 5 days ago • 34

upvoted 4 papers 3 days ago

upvoted a paper 5 days ago

Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models

Paper • 2504.07951 • Published 5 days ago • 20

upvoted 5 papers 11 days ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published 19 days ago • 39

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features

Paper • 2504.00557 • Published 15 days ago • 15

Scaling Language-Free Visual Representation Learning

Paper • 2504.01017 • Published 14 days ago • 26

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Paper • 2504.00883 • Published 14 days ago • 60

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Paper • 2504.02587 • Published 12 days ago • 30

upvoted 2 papers 14 days ago

Multi-Token Attention

Paper • 2504.00927 • Published 14 days ago • 43

Command A: An Enterprise-Ready Large Language Model

Paper • 2504.00698 • Published 14 days ago • 23

upvoted a collection 18 days ago

BLIP models

Collection

A collection of all BLIP models • 8 items • Updated Feb 18 • 24

upvoted 2 papers 22 days ago

Where do Large Vision-Language Models Look at when Answering Questions?

Paper • 2503.13891 • Published 29 days ago • 8

See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias

Paper • 2503.13834 • Published 29 days ago • 5

upvoted a paper 29 days ago

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published 30 days ago • 27

upvoted an article about 2 months ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

• 149

upvoted a collection 6 months ago

Llama 3.1

Collection

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 661