OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models Paper • 2307.03084 • Published Jul 5, 2023 • 1
OpenPrompt: An Open-source Framework for Prompt-learning Paper • 2111.01998 • Published Nov 3, 2021 • 1
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting Paper • 2402.13720 • Published Feb 21, 2024 • 7
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Paper • 2502.14856 • Published 17 days ago • 7
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs Paper • 2502.12085 • Published 20 days ago • 2
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs Paper • 2502.12085 • Published 20 days ago • 2
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Paper • 2502.14856 • Published 17 days ago • 7 • 2
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Paper • 2502.14856 • Published 17 days ago • 7
Configurable Foundation Models: Building LLMs from a Modular Perspective Paper • 2409.02877 • Published Sep 4, 2024 • 29
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published Jun 22, 2024 • 14
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Paper • 2404.06395 • Published Apr 9, 2024 • 22
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14, 2024 • 21