Mamba2 Distilled Model
Version: 1.0
Architecture: MOHAWK LMHead
Overview
This model is a distilled version of SmolLM2-1.7B using the MOHAWK method to SSM based Mamba2 architecture, keeping the MLP layers as is and replacing the attention layers with Mamba2 layers. It was developed for On Pruning State-Space LLMs.
Evaluation
The model has been benchmarked on several tasks:
Task | Metric | Value | Stderr |
---|---|---|---|
ARC Challenge | acc | 0.4164 | ±0.0144 |
ARC Easy | acc | 0.7492 | ±0.0089 |
Hellaswag | acc | 0.4988 | ±0.0050 |
Lambada (OpenAI) | acc | 0.5707 | ±0.0069 |
perplexity | 7.0794 | ±0.1761 | |
PIQA | acc | 0.7661 | ±0.0099 |
Winogrande | acc | 0.6283 | ±0.0136 |
Note:
- For accuracy metrics, higher values are better.
- For perplexity, lower values are better.
Intended Use
- General NLP Tasks: Suitable for various language understanding and reasoning tasks.
- Research & Prototyping: Ideal for lightweight experiments and efficient production environments.
Citation
If you use this model, please cite:
@misc{ghattas2025pruningstatespacellms,
title={On Pruning State-Space LLMs},
author={Tamer Ghattas and Michael Hassid and Roy Schwartz},
year={2025},
eprint={2502.18886},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.18886},
}
Model Card Last Updated: February 16, 2025
- Downloads last month
- 38
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for schwartz-lab/Smol2-Mamba-1.9B
Base model
HuggingFaceTB/SmolLM2-1.7B