1 96 5

Guanzhou Ke

guanzhouk

Guanzhou-Ke

AI & ML interests

Multi-modal learning

Recent Activity

upvoted a paper 8 days ago

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

upvoted a paper 14 days ago

An Empirical Study of GPT-4o Image Generation Capabilities

upvoted a paper 15 days ago

SmolVLM: Redefining small and efficient multimodal models

View all activity

Organizations

None yet

guanzhouk's activity

upvoted a paper 8 days ago

Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

Paper • 2504.08685 • Published 13 days ago • 121

upvoted a paper 14 days ago

An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published 16 days ago • 61

upvoted a paper 15 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 16 days ago • 170

upvoted 2 papers 16 days ago

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published 16 days ago • 98

MegaMath: Pushing the Limits of Open Math Corpora

Paper • 2504.02807 • Published 20 days ago • 30

upvoted a paper 17 days ago

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published 23 days ago • 256

upvoted a paper 23 days ago

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Paper • 2503.21614 • Published 28 days ago • 39

upvoted a paper 25 days ago

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Paper • 2503.21460 • Published 28 days ago • 76

upvoted a paper 27 days ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published 29 days ago • 141

upvoted a paper 29 days ago

Mind with Eyes: from Language Reasoning to Multimodal Reasoning

Paper • 2503.18071 • Published Mar 23 • 3

upvoted a paper 30 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 143

upvoted 3 papers about 1 month ago

Large-scale Pre-training for Grounded Video Caption Generation

Paper • 2503.10781 • Published Mar 13 • 17

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12 • 71

Gemini Embedding: Generalizable Embeddings from Gemini

Paper • 2503.07891 • Published Mar 10 • 37

upvoted 3 papers about 2 months ago

upvoted a paper 2 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 182

liked a dataset 2 months ago

lmms-lab/LLaVA-Video-178K

Viewer • Updated Oct 11, 2024 • 1.63M • 15.8k • 137

upvoted a paper 2 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 155