Wen Sun's picture

9 40

Wen Sun

HermitSun

·

AI & ML interests

None yet

Recent Activity

liked a model about 2 months ago

Qwen/Qwen2.5-VL-72B-Instruct

liked a model about 2 months ago

deepseek-ai/DeepSeek-R1

liked a model about 2 months ago

openbmb/MiniCPM-o-2_6

View all activity

Organizations

None yet

HermitSun's activity

upvoted a paper 11 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 256

upvoted 8 papers about 1 year ago

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 63

SaulLM-7B: A pioneering Large Language Model for Law

Paper • 2403.03883 • Published Mar 6, 2024 • 80

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 186

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Paper • 2403.02545 • Published Mar 4, 2024 • 17

Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5, 2024 • 95

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 55

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 138

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 610