mgoin's picture
- gsm8k
# mpt-7b-gsm8k-pruned60
**Paper**: [Sparse Finetuning for Inference Acceleration of Large Language Models](
This model was produced from a [MPT-7B base model]( finetuned on the GSM8k dataset for 2 epochs, pruned to 60% with [SparseGPT]( and retrained for 2 epochs with L2 distillation.
GSM8k zero-shot accuracy with [lm-evaluation-harness]( : 28.8%
All MPT model weights are available on [SparseZoo]( and CPU speedup for generative inference can be reproduced by following the instructions at [DeepSparse](
| Model Links | Compression |
| --------------------------------------------------------------------------------------------------------- | --------------------------------- |
| [neuralmagic/mpt-7b-gsm8k-quant]( | Quantization (W8A8) |
| [neuralmagic/mpt-7b-gsm8k-pruned40-quant]( | Quantization (W8A8) & 40% Pruning |
| [neuralmagic/mpt-7b-gsm8k-pruned50-quant]( | Quantization (W8A8) & 50% Pruning |
| [neuralmagic/mpt-7b-gsm8k-pruned60-quant]( | Quantization (W8A8) & 60% Pruning |
| [neuralmagic/mpt-7b-gsm8k-pruned70-quant]( | Quantization (W8A8) & 70% Pruning |
| [neuralmagic/mpt-7b-gsm8k-pruned70-quant]( | Quantization (W8A8) & 75% Pruning |
| [neuralmagic/mpt-7b-gsm8k-pruned80-quant]( | Quantization (W8A8) & 80% Pruning |
For general questions on these models and sparsification methods, reach out to the engineering team on our [community Slack](