mwitiderrick commited on
Commit
807a813
1 Parent(s): b7a10ab

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mgoin/llama2-7b-gsm8k-pt
3
+ inference: false
4
+ model_type: llama
5
+ prompt_template: |
6
+ Question
7
+ {prompt}\n
8
+ Answer:
9
+ quantized_by: mwitiderrick
10
+ tags:
11
+ - deepsparse
12
+ ---
13
+ ## Llama2-7b-gsm8k-pt
14
+ This repo contains model files for [llama2-7b-gsm8k-pt](https://huggingface.co/mgoin/llama2-7b-gsm8k-pt) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
15
+
16
+ This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
17
+ ## Inference
18
+ Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
19
+ ```bash
20
+ pip install deepsparse-nightly[llm]
21
+ ```
22
+ Run in a [Python pipeline](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md):
23
+ ```python
24
+ from deepsparse import TextGeneration
25
+
26
+ prompt = "James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total meters does he run a week?"
27
+ formatted_prompt = f"Question:{prompt}\nAnswer:"
28
+
29
+ model = TextGeneration(model_path="hf:nm-testing/TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds")
30
+ print(model(formatted_prompt, max_new_tokens=200).generations[0].text)
31
+ """
32
+ First find the total distance of one sprint: 60 meters * 3 = <<60*3=180>>180 meters
33
+ Then multiply the distance of one sprint by the number of sprints per week: 180 meters/sprint * 3 sprints/week = <<180*3=540>>540 meters/week
34
+ #### 540
35
+ """
36
+ ```
37
+ To obtain the final model the following process was followed:
38
+ - Sparsify the model to 50% using SparseML
39
+ - Fine-tune the sparse model on the GSM8K dataset
40
+ - Perform one-shot quantization of the resulting model
41
+
42
+
43
+
44
+