Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
Yifei Zuo's picture
1 2

Yifei Zuo

YifeiZuo
https://yifei-zuo.github.io/
  • YifeiZuoX
  • Yifei-Zuo

AI & ML interests

None yet

Recent Activity

authored a paper about 12 hours ago
Parallax: Parameterized Local Linear Attention for Language Modeling
upvoted a paper about 17 hours ago
Parallax: Parameterized Local Linear Attention for Language Modeling
submitted a paper about 18 hours ago
Parallax: Parameterized Local Linear Attention for Language Modeling
View all activity

Organizations

None yet

YifeiZuo 's collections 1

Attention 0.6B AdamW-WSD training trajectory
Per-step record (every 500 steps, 40 ckpts) of the 0.6B Qwen3 softmax-attention baseline trained AdamW + WSD on 80B tokens.
  • YifeiZuo/attention-0.6b-adamw-wsd-step500

    0.6B • Updated 20 days ago
  • YifeiZuo/attention-0.6b-adamw-wsd-step1000

    0.6B • Updated 20 days ago
  • YifeiZuo/attention-0.6b-adamw-wsd-step1500

    0.6B • Updated 20 days ago
  • YifeiZuo/attention-0.6b-adamw-wsd-step2000

    0.6B • Updated 20 days ago
Attention 0.6B AdamW-WSD training trajectory
Per-step record (every 500 steps, 40 ckpts) of the 0.6B Qwen3 softmax-attention baseline trained AdamW + WSD on 80B tokens.
  • YifeiZuo/attention-0.6b-adamw-wsd-step500

    0.6B • Updated 20 days ago
  • YifeiZuo/attention-0.6b-adamw-wsd-step1000

    0.6B • Updated 20 days ago
  • YifeiZuo/attention-0.6b-adamw-wsd-step1500

    0.6B • Updated 20 days ago
  • YifeiZuo/attention-0.6b-adamw-wsd-step2000

    0.6B • Updated 20 days ago
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs