PyTorch

Mamba2 Distilled Model

Version: 1.0
Architecture: MOHAWK LMHead


Overview

This model is a distilled version of SmolLM2-1.7B using the MOHAWK method to SSM based Mamba2 architecture, keeping the MLP layers as is and replacing the attention layers with Mamba2 layers. It was developed for On Pruning State-Space LLMs.

Evaluation

The model has been benchmarked on several tasks:

Task Metric Value Stderr
ARC Challenge acc 0.4164 ±0.0144
ARC Easy acc 0.7492 ±0.0089
Hellaswag acc 0.4988 ±0.0050
Lambada (OpenAI) acc 0.5707 ±0.0069
perplexity 7.0794 ±0.1761
PIQA acc 0.7661 ±0.0099
Winogrande acc 0.6283 ±0.0136

Note:

  • For accuracy metrics, higher values are better.
  • For perplexity, lower values are better.

Intended Use

  • General NLP Tasks: Suitable for various language understanding and reasoning tasks.
  • Research & Prototyping: Ideal for lightweight experiments and efficient production environments.

Citation

If you use this model, please cite:

@misc{ghattas2025pruningstatespacellms,
  title={On Pruning State-Space LLMs}, 
  author={Tamer Ghattas and Michael Hassid and Roy Schwartz},
  year={2025},
  eprint={2502.18886},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2502.18886}, 
}

Model Card Last Updated: February 16, 2025

Downloads last month
38
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for schwartz-lab/Smol2-Mamba-1.9B

Finetuned
(16)
this model