File size: 1,697 Bytes
807a813 a98c146 807a813 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
base_model: mgoin/llama2-7b-gsm8k-pt
inference: false
model_type: llama
prompt_template: |
Question
{prompt}\n
Answer:
quantized_by: mwitiderrick
tags:
- deepsparse
---
## Llama2-7b-gsm8k-pt
This repo contains model files for [llama2-7b-gsm8k-pt](https://huggingface.co/mgoin/llama2-7b-gsm8k-pt) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
## Inference
Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
```bash
pip install deepsparse-nightly[llm]
```
Run in a [Python pipeline](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md):
```python
from deepsparse import TextGeneration
prompt = "James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total meters does he run a week?"
formatted_prompt = f"Question:{prompt}\nAnswer:"
model = TextGeneration(model_path="hf:nm-testing/llama2-7b-gsm8k-pt-pruned50-quant-ds")
print(model(formatted_prompt, max_new_tokens=200).generations[0].text)
"""
First find the total distance of one sprint: 60 meters * 3 = <<60*3=180>>180 meters
Then multiply the distance of one sprint by the number of sprints per week: 180 meters/sprint * 3 sprints/week = <<180*3=540>>540 meters/week
#### 540
"""
```
To obtain the final model the following process was followed:
- Sparsify the model to 50% using SparseML
- Fine-tune the sparse model on the GSM8K dataset
- Perform one-shot quantization of the resulting model
|