Yilong Zhao

ylzhao

https://happierpig.github.io/

happierpig

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

S*: Test Time Scaling for Code Generation

upvoted a paper 2 months ago

The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

upvoted a paper 2 months ago

Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile

View all activity

Organizations

None yet

ylzhao's activity

upvoted a paper about 2 months ago

S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20 • 63

upvoted 2 papers 2 months ago

The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

Paper • 2502.08235 • Published Feb 12 • 57

Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile

Paper • 2502.06155 • Published Feb 10 • 9

upvoted a paper 3 months ago

Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 38

upvoted 2 papers 6 months ago

MIO: A Foundation Model on Multimodal Tokens

Paper • 2409.17692 • Published Sep 26, 2024 • 54

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Paper • 2406.10774 • Published Jun 16, 2024 • 3

upvoted a collection 7 months ago

Llama 3.2

Collection

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 598

authored a paper 8 months ago

NanoFlow: Towards Optimal Large Language Model Serving Throughput

Paper • 2408.12757 • Published Aug 22, 2024 • 18

upvoted a paper 8 months ago

NanoFlow: Towards Optimal Large Language Model Serving Throughput

Paper • 2408.12757 • Published Aug 22, 2024 • 18

upvoted a paper about 1 year ago

Hydragen: High-Throughput LLM Inference with Shared Prefixes

Paper • 2402.05099 • Published Feb 7, 2024 • 20

authored a paper over 1 year ago

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Paper • 2310.19102 • Published Oct 29, 2023 • 11

upvoted 2 papers over 1 year ago

FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 37

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

Paper • 2311.02103 • Published Nov 1, 2023 • 21