mwitiderrick
commited on
Commit
•
807a813
1
Parent(s):
b7a10ab
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: mgoin/llama2-7b-gsm8k-pt
|
3 |
+
inference: false
|
4 |
+
model_type: llama
|
5 |
+
prompt_template: |
|
6 |
+
Question
|
7 |
+
{prompt}\n
|
8 |
+
Answer:
|
9 |
+
quantized_by: mwitiderrick
|
10 |
+
tags:
|
11 |
+
- deepsparse
|
12 |
+
---
|
13 |
+
## Llama2-7b-gsm8k-pt
|
14 |
+
This repo contains model files for [llama2-7b-gsm8k-pt](https://huggingface.co/mgoin/llama2-7b-gsm8k-pt) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
|
15 |
+
|
16 |
+
This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
|
17 |
+
## Inference
|
18 |
+
Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
|
19 |
+
```bash
|
20 |
+
pip install deepsparse-nightly[llm]
|
21 |
+
```
|
22 |
+
Run in a [Python pipeline](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md):
|
23 |
+
```python
|
24 |
+
from deepsparse import TextGeneration
|
25 |
+
|
26 |
+
prompt = "James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total meters does he run a week?"
|
27 |
+
formatted_prompt = f"Question:{prompt}\nAnswer:"
|
28 |
+
|
29 |
+
model = TextGeneration(model_path="hf:nm-testing/TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds")
|
30 |
+
print(model(formatted_prompt, max_new_tokens=200).generations[0].text)
|
31 |
+
"""
|
32 |
+
First find the total distance of one sprint: 60 meters * 3 = <<60*3=180>>180 meters
|
33 |
+
Then multiply the distance of one sprint by the number of sprints per week: 180 meters/sprint * 3 sprints/week = <<180*3=540>>540 meters/week
|
34 |
+
#### 540
|
35 |
+
"""
|
36 |
+
```
|
37 |
+
To obtain the final model the following process was followed:
|
38 |
+
- Sparsify the model to 50% using SparseML
|
39 |
+
- Fine-tune the sparse model on the GSM8K dataset
|
40 |
+
- Perform one-shot quantization of the resulting model
|
41 |
+
|
42 |
+
|
43 |
+
|
44 |
+
|