Wenhao Chai's picture

18 50

Wenhao Chai

Reself

·

http://rese1f.github.io

re5e1f

rese1f

AI & ML interests

computer vision, artificial intelligence

Organizations

Reself's activity

upvoted a paper 2 months ago

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22 • 17

upvoted a collection 3 months ago

Meta Llama 3

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 618

upvoted 2 papers 3 months ago

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11 • 41

sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28 • 32

upvoted a paper 5 months ago

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Paper • 2402.07865 • Published Feb 12 • 11

upvoted a collection 5 months ago

Qwen1.5

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 27 days ago • 199

upvoted 4 papers 7 months ago

Controllable Human-Object Interaction Synthesis

Paper • 2312.03913 • Published Dec 6, 2023 • 22

Dolphins: Multimodal Language Model for Driving

Paper • 2312.00438 • Published Dec 1, 2023 • 12

CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation

Paper • 2311.18775 • Published Nov 30, 2023 • 6

FreeU: Free Lunch in Diffusion U-Net

Paper • 2309.11497 • Published Sep 20, 2023 • 63

upvoted 5 papers 10 months ago

Doppelgangers: Learning to Disambiguate Images of Similar Structures

Paper • 2309.02420 • Published Sep 5, 2023 • 9

Emergence of Segmentation with Minimalistic White-Box Transformers

Paper • 2308.16271 • Published Aug 30, 2023 • 13

Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

Paper • 2303.16456 • Published Mar 29, 2023 • 1

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

Paper • 2308.09592 • Published Aug 18, 2023 • 2

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Paper • 2307.16449 • Published Jul 31, 2023 • 14

upvoted 2 papers 11 months ago

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Paper • 2308.01907 • Published Aug 3, 2023 • 10

To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation

Paper • 2307.15063 • Published Jul 27, 2023 • 15

upvoted a paper 12 months ago

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

Paper • 2307.07487 • Published Jul 14, 2023 • 19