---
tags:
- distilbert
- sparsity
- pruning
- compression
language: en
datasets: sst2
---

Repo includes all necessary files for staging an Inference Endpoints API with [DeepSparse](https://github.com/neuralmagic/deepsparse) as discussed in this [BLOG](https://neuralmagic.com/blog/accelerate-hugging-face-inference-endpoints-with-deepsparse/).
This DistilBERT was sparsified using the [SparseML](https://github.com/neuralmagic/sparseml) library.

# Sparse Transfer 80% VNNI Pruned DistilBERT

This model is the result of pruning the DistilBERT model to 80% using the VNNI blocking (semi-structured), followed by fine-tuning and quantization on the SST2 dataset. Pruning is performed with the GMP algorithm and using the masked language modeling task based on the BookCorpus and Wikipedia datasets. It achieves 90.5% accuracy on the validation dataset, recovering over 99% of the accuracy of the baseline model. See the included [recipe](https://sparsezoo.neuralmagic.com/models/distilbert-sst2_wikipedia_bookcorpus-pruned80.4block_quantized?comparison=distilbert-sst2_wikipedia_bookcorpus-base) for training instructions.