Zhaokai Wang's picture

9 8 9

Zhaokai Wang

wzk1015

·

https://www.wzk.plus

wzk1015

AI & ML interests

Computer Vision Music Generation Multimodal Large Language Models

Recent Activity

commented a paper 8 days ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

upvoted a paper 9 days ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

commented a paper 9 days ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

View all activity

Organizations

wzk1015's activity

upvoted 2 papers 9 days ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published 13 days ago • 35

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Paper • 2412.09428 • Published 13 days ago • 7

upvoted a paper 16 days ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published 19 days ago • 121

upvoted a paper about 1 month ago

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Paper • 2410.08202 • Published Oct 10 • 4

upvoted a collection about 1 month ago

InternVL2.5

Better than InternVL 2.0 • 18 items • Updated 4 days ago • 77

upvoted a collection 3 months ago

Mono-InternVL

A Pioneering Monolithic MLLM • 2 items • Updated 4 days ago • 6

upvoted a paper 5 months ago

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Paper • 2407.08770 • Published Jul 11 • 19

upvoted a collection 7 months ago

InternVL1.0

Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks • 16 items • Updated 4 days ago • 18