30 432 21

Fangyuan Yu PRO

Ksgk-fy

fangyuan-ksgk

AI & ML interests

AGI

Recent Activity

updated a collection 1 day ago

Representation & Optimization

upvoted a paper 1 day ago

Value Residual Learning For Alleviating Attention Concentration In Transformers

updated a collection 1 day ago

Representation & Optimization

View all activity

Organizations

Ksgk-fy's activity

upvoted a paper 1 day ago

Value Residual Learning For Alleviating Attention Concentration In Transformers

Paper • 2410.17897 • Published Oct 23, 2024 • 9

upvoted 3 papers 2 days ago

Flex Attention: A Programming Model for Generating Optimized Attention Kernels

Paper • 2412.05496 • Published Dec 7, 2024 • 1

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Paper • 2504.00906 • Published 3 days ago • 18

ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published 4 days ago • 1

upvoted a collection 2 days ago

Representation & Optimization

Collection

Understanding about representation sheds light on optimization • 7 items • Updated 1 day ago • 1

upvoted 2 papers 3 days ago

Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1

Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published 3 days ago • 1

upvoted a paper 6 days ago

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1

upvoted a paper 8 days ago

Layer by Layer: Uncovering Hidden Representations in Language Models

Paper • 2502.02013 • Published Feb 4 • 1

upvoted 3 papers 9 days ago

CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners

Paper • 2503.16356 • Published 15 days ago • 15

I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published 11 days ago • 110

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published 30 days ago • 90

upvoted a paper 12 days ago

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published 15 days ago • 34

upvoted a paper 20 days ago

Denoising Hamiltonian Network for Physical Reasoning

Paper • 2503.07596 • Published 25 days ago • 1

upvoted a collection about 1 month ago

Image / Video Gen

Collection

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 36 items • Updated Mar 1 • 9

upvoted 5 papers about 1 month ago

Scaling LLM Pre-training with Vocabulary Curriculum

Paper • 2502.17910 • Published Feb 25 • 1

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 72

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published Feb 19 • 68

You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13 • 34

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18 • 68