neuralmagic
/

OpenHermes-2.5-Mistral-7B-pruned50-quant-ds

Text Generation

Model card Files Files and versions Community

mgoin commited on Nov 21, 2023

Commit

816ad03

•

1 Parent(s): 9afc8ca

Create README.md

Files changed (1) hide show

README.md +74 -0

README.md ADDED Viewed

	@@ -0,0 +1,74 @@

+---
+base_model: teknium/OpenHermes-2.5-Mistral-7B
+inference: false
+model_type: mistral
+prompt_template: |
+  <|im_start|>system
+  {system_message}<|im_end|>
+  <|im_start|>user
+  {prompt}<|im_end|>
+  <|im_start|>assistant
+sparsified_by: mgoin
+tags:
+- deepsparse
+---
+# OpenHermes 2.5 Mistral 7B - DeepSparse
+This repo contains [DeepSparse](https://github.com/neuralmagic/deepsparse) model files for [Teknium's OpenHermes 2.5 Mistral 7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B).
+This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
+## Inference
+Install DeepSparse: `pip install deepsparse-nightly[llm]`
+```python
+from deepsparse import TextGeneration
+system_message = ""
+prompt = "Write a quick sort algorithm in Python"
+formatted_prompt = f"<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"
+model = TextGeneration(model="hf:mgoin/Nous-Hermes-llama-2-7b-pruned50-quant-ds")
+print(model(formatted_prompt, max_new_tokens=500).generations[0].text)
+```
+## Prompt template: ChatML
+```
+<|im_start|>system
+{system_message}<|im_end|>
+<|im_start|>user
+{prompt}<|im_end|>
+<|im_start|>assistant
+```
+## Sparsification
+See the `recipe.yaml` in this repo and follow the instructions below.
+```
+git clone https://github.com/neuralmagic/sparseml
+pip install -e "sparseml[transformers]"
+python sparseml/src/sparseml/transformers/sparsification/obcq/obcq.py teknium/OpenHermes-2.5-Mistral-7B open_platypus --recipe recipe.yaml --save True
+python sparseml/src/sparseml/transformers/sparsification/obcq/export.py --task text-generation --model_path obcq_deployment
+cp deployment/model.onnx deployment/model-orig.onnx
+```
+```python
+import os
+import onnx
+from sparseml.exporters.kv_cache_injector import KeyValueCacheInjector
+input_file = "deployment/model-orig.onnx"
+output_file = "deployment/model.onnx"
+model = onnx.load(input_file, load_external_data=False)
+model = KeyValueCacheInjector(model_path=os.path.dirname(input_file)).apply(model)
+onnx.save(model, output_file)
+print(f"Modified model saved to: {output_file}")
+```
+## Slack
+For further support, and discussions on these models and AI in general, join us at:
+[Neural Magic's Slack server](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)