2 57 88

Zeyi Sun

Zery

https://github.com/SunzeY

AI & ML interests

Recent Activity

upvoted a paper 4 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

authored a paper 6 days ago

RelightVid: Temporal-Consistent Diffusion Model for Video Relighting

liked a Space 6 days ago

microsoft/OmniParser-v2

View all activity

Organizations

None yet

Zery's activity

upvoted a paper 4 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published 5 days ago • 61

authored a paper 6 days ago

RelightVid: Temporal-Consistent Diffusion Model for Video Relighting

Paper • 2501.16330 • Published Jan 27

liked a Space 6 days ago

358

OmniParser V2

🏢

OmniParser, turn your LLM into GUI agent

liked a model 7 days ago

microsoft/OmniParser-v2.0

Image-Text-to-Text • Updated 12 days ago • 7.21k • 1.04k

upvoted a paper 10 days ago

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Paper • 2502.13128 • Published 12 days ago • 36

liked a dataset 15 days ago

OS-Copilot/OS-Atlas-data

Updated Dec 4, 2024 • 20.8k • 16

liked a dataset 18 days ago

osunlp/Mind2Web

Viewer • Updated Jul 19, 2023 • 253 • 606 • 100

liked a model 19 days ago

ysmikey/Layerpano3D-FLUX-Panorama-LoRA

Text-to-Image • Updated 22 days ago • • 3

upvoted a paper 20 days ago

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

Paper • 2502.05173 • Published 23 days ago • 61

upvoted a paper about 1 month ago

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Paper • 2501.12368 • Published Jan 21 • 42

liked a model about 1 month ago

physical-intelligence/fast

Robotics • Updated Jan 16 • 82

liked a Space about 1 month ago

2.85k

IC Light V2

📈

Run code from environment variable

upvoted 2 papers about 2 months ago

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Paper • 2501.03218 • Published Jan 6 • 35

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

Paper • 2501.03226 • Published Jan 6 • 41

updated a collection 2 months ago

Alpha-CLIP

Collection

5 items • Updated Dec 18, 2024 • 3

liked 2 datasets 2 months ago

Zery/Alpha-GRIT

Updated Jul 18, 2024 • 33 • 2

KwaiVGI/360Motion-Dataset

Viewer • Updated Jan 22 • 52 • 1.29k • 27

upvoted 3 papers 3 months ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 94

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Paper • 2412.07674 • Published Dec 10, 2024 • 20

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Paper • 2412.03069 • Published Dec 4, 2024 • 31