Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Paper • 2504.03624 • Published 20 days ago • 13
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning Paper • 2504.11409 • Published 9 days ago • 10
LLM-Pruner: On the Structural Pruning of Large Language Models Paper • 2305.11627 • Published May 19, 2023 • 3
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient Paper • 2411.17787 • Published Nov 26, 2024 • 12
MaskLLM Collection MaskLLM: Learnable Semi-structured Sparsity for Large Language Models (NeurIPS'24 Spotlight) • 3 items • Updated Dec 15, 2024