nm-testing
/

llama2-7b-gsm8k-pt-pruned50-quant-ds

Text Generation

text-generation-inference

Model card Files Files and versions Community

mwitiderrick commited on Feb 2

Commit

807a813

•

1 Parent(s): b7a10ab

Create README.md

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+---
+base_model: mgoin/llama2-7b-gsm8k-pt
+inference: false
+model_type: llama
+prompt_template: |
+  Question
+  {prompt}\n
+  Answer:
+quantized_by: mwitiderrick
+tags:
+- deepsparse
+---
+## Llama2-7b-gsm8k-pt
+This repo contains model files for [llama2-7b-gsm8k-pt](https://huggingface.co/mgoin/llama2-7b-gsm8k-pt) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
+This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
+## Inference
+Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
+```bash
+pip install deepsparse-nightly[llm]
+```
+Run in a [Python pipeline](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md):
+```python
+from deepsparse import TextGeneration
+prompt = "James decides to run 3 sprints 3 times a week. He runs 60 meters each sprint. How many total meters does he run a week?"
+formatted_prompt =  f"Question:{prompt}\nAnswer:"
+model = TextGeneration(model_path="hf:nm-testing/TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds")
+print(model(formatted_prompt, max_new_tokens=200).generations[0].text)
+"""
+First find the total distance of one sprint: 60 meters * 3 = <<60*3=180>>180 meters
+Then multiply the distance of one sprint by the number of sprints per week: 180 meters/sprint * 3 sprints/week = <<180*3=540>>540 meters/week
+#### 540
+"""
+```
+To obtain the final model the following process was followed:
+- Sparsify the model to 50% using SparseML
+- Fine-tune the sparse model on the GSM8K dataset
+- Perform one-shot quantization of the resulting model