1 14

Zhengyan Zhang

ZhengyanZhang

AI & ML interests

None yet

Recent Activity

authored a paper about 1 month ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

upvoted a paper 4 months ago

Densing Law of LLMs

upvoted a paper 5 months ago

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

View all activity

Organizations

ZhengyanZhang's activity

authored a paper about 1 month ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 150

upvoted a paper 4 months ago

Densing Law of LLMs

Paper • 2412.04315 • Published Dec 5, 2024 • 19

upvoted a paper 5 months ago

Sparsing Law: Towards Large Language Models with Greater Activation Sparsity

Paper • 2411.02335 • Published Nov 4, 2024 • 11

upvoted 2 papers 7 months ago

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published Sep 6, 2024 • 46

Configurable Foundation Models: Building LLMs from a Modular Perspective

Paper • 2409.02877 • Published Sep 4, 2024 • 29

authored a paper 7 months ago

Configurable Foundation Models: Building LLMs from a Modular Perspective

Paper • 2409.02877 • Published Sep 4, 2024 • 29

upvoted a paper 8 months ago

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15, 2024 • 57

upvoted a paper 9 months ago

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 92

upvoted 2 papers 10 months ago

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Paper • 2406.11431 • Published Jun 17, 2024 • 4

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17, 2024 • 63

authored a paper 10 months ago

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Paper • 2406.05955 • Published Jun 10, 2024 • 27

upvoted 2 papers 10 months ago

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

Paper • 2406.06282 • Published Jun 10, 2024 • 38

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Paper • 2406.05955 • Published Jun 10, 2024 • 27

upvoted 3 papers about 1 year ago

Advancing LLM Reasoning Generalists with Preference Trees

Paper • 2404.02078 • Published Apr 2, 2024 • 45

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Paper • 2403.09347 • Published Mar 14, 2024 • 22

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 46

upvoted a paper over 1 year ago

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Paper • 2312.12456 • Published Dec 16, 2023 • 43

updated 3 models over 3 years ago