ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding Paper • 2402.13485 • Published Feb 21, 2024
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference Paper • 2504.05897 • Published 14 days ago • 13
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference Paper • 2501.06807 • Published Jan 12
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference Paper • 2408.10284 • Published Aug 19, 2024 • 1
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention Paper • 2211.13955 • Published Nov 25, 2022
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts Paper • 2306.04845 • Published Jun 8, 2023 • 4
AlphaNet: Improved Training of Supernets with Alpha-Divergence Paper • 2102.07954 • Published Feb 16, 2021 • 2
AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling Paper • 2011.09011 • Published Nov 18, 2020 • 2
Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks Paper • 2002.04116 • Published Feb 10, 2020 • 1