-
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 41 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 4 -
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Paper • 2403.09919 • Published • 19 -
On Speculative Decoding for Multimodal Large Language Models
Paper • 2404.08856 • Published • 11
Linkun
hugg1ngfac3
·
AI & ML interests
None yet
Organizations
None yet
Collections
4
models
None public yet
datasets
None public yet