Edit model card

mpt-7b-gsm8k-pruned60

Paper: Sparse Finetuning for Inference Acceleration of Large Language Models
Code: https://github.com/neuralmagic/deepsparse/tree/main/research/mpt

This model was produced from a MPT-7B base model finetuned on the GSM8k dataset for 2 epochs, pruned to 60% with SparseGPT and retrained for 2 epochs with L2 distillation.

GSM8k zero-shot accuracy with lm-evaluation-harness : 28.8%

All MPT model weights are available on SparseZoo and CPU speedup for generative inference can be reproduced by following the instructions at DeepSparse

Model Links Compression
neuralmagic/mpt-7b-gsm8k-quant Quantization (W8A8)
neuralmagic/mpt-7b-gsm8k-pruned40-quant Quantization (W8A8) & 40% Pruning
neuralmagic/mpt-7b-gsm8k-pruned50-quant Quantization (W8A8) & 50% Pruning
neuralmagic/mpt-7b-gsm8k-pruned60-quant Quantization (W8A8) & 60% Pruning
neuralmagic/mpt-7b-gsm8k-pruned70-quant Quantization (W8A8) & 70% Pruning
neuralmagic/mpt-7b-gsm8k-pruned70-quant Quantization (W8A8) & 75% Pruning
neuralmagic/mpt-7b-gsm8k-pruned80-quant Quantization (W8A8) & 80% Pruning

For general questions on these models and sparsification methods, reach out to the engineering team on our community Slack.

Downloads last month
15
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train neuralmagic/mpt-7b-gsm8k-pruned60-pt

Collection including neuralmagic/mpt-7b-gsm8k-pruned60-pt