tags:
- vllm
- sparsity
pipeline_tag: text-generation
license: llama3.1
base_model: neuralmagic/Sparse-Llama-3.1-8B-2of4
datasets:
- openai/gsm8k
language:
- en
metrics:
- accuracy
Sparse-Llama-3.1-8B-gsm8k-2of4
Model Overview
- Model Architecture: Llama-3.1-8B
- Input: Text
- Output: Text
- Model Optimizations:
- Sparsity: 2:4
- Release Date: 11/21/2024
- Version: 1.0
- License(s): llama3.1
- Model Developers: Neural Magic
This is AI model especialized in grade-school math obtained by fine-tuning the 2:4 sparse Sparse-Llama-3.1-8B-2of4 on the GSM8k dataset. It achieves 66.9% 0-shot accuracy on the test set of GSM8k, compared to 66.3% for the fine-tuned dense model Llama-3.1-8B-gsm8k — demonstrating over 100% accuracy recovery. In constrast, the pretrained Llama-3.1-8B achieves 50.7% 5-shot accuracy and the sparse foundational Sparse-Llama-3.1-8B-2of4 model achieves 56.3% 5-shot accuracy.
Model Optimizations
This inherits the optimizations from its parent, Sparse-Llama-3.1-8B-2of4. Namely, all linear operators within transformer blocks were pruned to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned.
Deployment with vLLM
This model can be deployed efficiently using the vLLM backend. vLLM aslo supports OpenAI-compatible serving. See the documentation for more details.
Evaluation
This model was evaluated on the lm-evaluation-harness.
Accuracy
GSM8k Benchmark
Metric | Llama-3.1-8B (5-shot) |
Sparse-Llama-3.1-8B-2of4 (5-shot) |
Llama-3.1-8B-gsm8k (0-shot) |
Sparse-Llama-3.1-8B-gsm8k-2of4 (0-shot) |
Accuracy | 50.7% | 56.3% | 66.3% | 66.9% |