mgoin commited on
Commit
2e024c3
1 Parent(s): 6a54a6c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/Orca-2-13b
3
+ inference: false
4
+ model_type: llama
5
+ prompt_template: |
6
+ <|im_start|>system
7
+ {system_message}<|im_end|>
8
+ <|im_start|>user
9
+ {prompt}<|im_end|>
10
+ <|im_start|>assistant
11
+ quantized_by: mgoin
12
+ tags:
13
+ - deepsparse
14
+ ---
15
+
16
+ # Orca 2 7B - DeepSparse
17
+
18
+ This repo contains model files for [Microsoft's Orca 2 7B](https://huggingface.co/microsoft/Orca-2-13b) optimized for [DeepSparse](https://github.com/neuralmagic/deepsparse), a CPU inference runtime for sparse models.
19
+
20
+ This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
21
+
22
+ ## Inference
23
+
24
+ Install [DeepSparse LLM](https://github.com/neuralmagic/deepsparse) for fast inference on CPUs:
25
+ ```
26
+ pip install deepsparse-nightly[llm]
27
+ ```
28
+
29
+ Run in a [Python pipeline](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md):
30
+ ```python
31
+ from deepsparse import TextGeneration
32
+ system_message = ""
33
+ prompt = "Who inspires you the most?"
34
+ formatted_prompt = f"<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"
35
+ model = TextGeneration(model="hf:mgoin/Orca-2-13b-pruned50-quant-ds")
36
+ print(model(formatted_prompt, max_new_tokens=100).generations[0].text)
37
+ """
38
+ That's a difficult question as there are many people who inspire me. However, one person who inspires me the most is my mother. She has shown me the importance of hard work, resilience, and perseverance. She has shown me how to overcome obstacles and how to be a strong and independent woman.
39
+ """
40
+ ```
41
+
42
+ ## Prompt template: ChatML
43
+
44
+ ```
45
+ <|im_start|>system
46
+ {system_message}<|im_end|>
47
+ <|im_start|>user
48
+ {prompt}<|im_end|>
49
+ <|im_start|>assistant
50
+
51
+ ```
52
+
53
+ ## Sparsification
54
+
55
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
56
+
57
+ ```bash
58
+ git clone https://github.com/neuralmagic/sparseml
59
+ pip install -e "sparseml[transformers]"
60
+ python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py microsoft/Orca-2-13b open_platypus --recipe recipe.yaml --save True
61
+ python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment
62
+ cp deployment/model.onnx deployment/model-orig.onnx
63
+ ```
64
+
65
+ Run this kv-cache injection afterwards:
66
+ ```python
67
+ import os
68
+ import onnx
69
+ from sparseml.exporters.kv_cache_injector import KeyValueCacheInjector
70
+ input_file = "deployment/model-orig.onnx"
71
+ output_file = "deployment/model.onnx"
72
+ model = onnx.load(input_file, load_external_data=False)
73
+ model = KeyValueCacheInjector(model_path=os.path.dirname(input_file)).apply(model)
74
+ onnx.save(model, output_file)
75
+ print(f"Modified model saved to: {output_file}")
76
+ ```
77
+
78
+ ## Slack
79
+
80
+ For further support, and discussions on these models and AI in general, join us at [Neural Magic's Slack server](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)