2 7 2

Jie Shao

hehesang

http://www.lamda.nju.edu.cn/shaoj/

hehesangsj

AI & ML interests

computer vision, ai for science

Recent Activity

authored a paper 2 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

upvoted a paper 3 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

upvoted a paper 14 days ago

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

View all activity

Organizations

hehesang's activity

authored a paper 2 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 4 days ago • 214

upvoted a paper 3 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 4 days ago • 214

upvoted a paper 14 days ago

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Paper • 2504.02826 • Published 15 days ago • 67

upvoted a paper 22 days ago

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published 24 days ago • 50

upvoted a paper about 1 month ago

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13 • 34

New activity in google/siglip2-giant-opt-patch16-384 about 2 months ago

AutoModel.from_pretrained error in loading state_dict

#3 opened about 2 months ago by

Srymaker

upvoted a paper 4 months ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 38

authored a paper 4 months ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 38

liked a model 4 months ago

OpenGVLab/InternVL2_5-78B

Image-Text-to-Text • Updated 24 days ago • 10k • 191

commented a paper 5 months ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 81 •

upvoted a paper 5 months ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 81

upvoted a paper 10 months ago

Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11, 2024 • 55

liked a model 11 months ago

OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-7B

Visual Question Answering • Updated Aug 24, 2024 • 64 • 9