Yuchen Cheng's picture

29 79

Yuchen Cheng

rudeigerc

·

https://rudeigerc.dev

rudeigerc

AI & ML interests

MLSys

Recent Activity

liked a model 5 days ago

google/gemma-3-27b-it

liked a model 5 days ago

microsoft/Phi-4-multimodal-instruct

liked a model 5 days ago

Qwen/QwQ-32B

View all activity

Organizations

None yet

rudeigerc's activity

upvoted a paper 22 days ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 29 days ago • 144

upvoted 2 papers 7 months ago

NanoFlow: Towards Optimal Large Language Model Serving Throughput

Paper • 2408.12757 • Published Aug 22, 2024 • 18

Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8, 2024 • 159

upvoted 2 papers 8 months ago

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 113

Inference Performance Optimization for Large Language Models on CPUs

Paper • 2407.07304 • Published Jul 10, 2024 • 52

upvoted a paper 9 months ago

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

Paper • 2406.06282 • Published Jun 10, 2024 • 38

upvoted 3 papers 10 months ago

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 32

Layer-Condensed KV Cache for Efficient Inference of Large Language Models

Paper • 2405.10637 • Published May 17, 2024 • 23

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50

upvoted 7 papers 11 months ago

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30, 2024 • 77

KAN: Kolmogorov-Arnold Networks

Paper • 2404.19756 • Published Apr 30, 2024 • 111

Multi-Head Mixture-of-Experts

Paper • 2404.15045 • Published Apr 23, 2024 • 60

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 127

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 256

RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 35

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 107

upvoted 3 papers 12 months ago

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 104

Jamba: A Hybrid Transformer-Mamba Language Model

Paper • 2403.19887 • Published Mar 28, 2024 • 108

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Paper • 2403.15447 • Published Mar 18, 2024 • 16

upvoted a paper about 1 year ago

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 126