mwitiderrick commited on
Commit
972a0af
1 Parent(s): 81f62ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -11
README.md CHANGED
@@ -1,21 +1,94 @@
1
- This model was quantized and pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
 
 
 
 
 
 
 
 
2
 
 
 
3
 
 
 
 
 
 
 
 
 
4
  ```python
5
- import torch
6
- from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
 
 
 
 
 
 
 
7
 
8
- model_id = "nm-testing/OpenHermes-2.5-Mistral-7B-pruned50"
9
- model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
10
- tokenizer = AutoTokenizer.from_pretrained(model_id)
11
- inputs = tokenizer("Hello my name is", return_tensors="pt")
12
- outputs = model.generate(**inputs, max_new_tokens=20)
13
- print(tokenizer.batch_decode(outputs)[0])
 
 
14
 
 
 
15
  """
16
- <s> Hello my name is Katie and I am a student at the University of Gloucestershire. I am currently studying
 
 
 
 
 
 
 
 
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  """
19
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- ```
 
1
+ ---
2
+ base_model: teknium/OpenHermes-2.5-Mistral-7B
3
+ inference: true
4
+ model_type: mistral
5
+ quantized_by: mgoin
6
+ tags:
7
+ - nm-vllm
8
+ - sparse
9
+ ---
10
 
11
+ ## OpenHermes-2.5-Mistral-7B-pruned50
12
+ This repo contains model files for [OpenHermes-2.5-Mistral-7B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) optimized for [NM-vLLM](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
13
 
14
+ This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
15
+
16
+ ## Inference
17
+ Install [NM-vLLM](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
18
+ ```bash
19
+ pip install nm-vllm[sparse]
20
+ ```
21
+ Run in a Python pipeline for local inference:
22
  ```python
23
+ from vllm import LLM, SamplingParams
24
+
25
+ model = LLM("nm-testing/OpenHermes-2.5-Mistral-7B-pruned50", sparsity="sparse_w16a16")
26
+ prompt = "How to make banana bread?"
27
+ formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"
28
+
29
+ sampling_params = SamplingParams(max_tokens=100)
30
+ outputs = model.generate(formatted_prompt, sampling_params=sampling_params)
31
+ print(outputs[0].outputs[0].text)
32
+ """
33
+ Here is a simple recipe for making banana bread:
34
 
35
+ Ingredients:
36
+ - 3 ripe bananas
37
+ - 2 eggs
38
+ - 1/2 cup of sugar
39
+ - 1/2 cup of butter
40
+ - 2 cups of flour
41
+ - 1 teaspoon baking powder
42
+ - 2 teaspoons of baking soda
43
 
44
+ Instructions:
45
+ 1. Preheat your oven at 350 degree Fahrenant.
46
  """
47
+ ```
48
+
49
+ ## Prompt template
50
+
51
+ ```
52
+ <|im_start|>user
53
+ {prompt}<|im_end|>
54
+ <|im_start|>assistant
55
+ ```
56
 
57
+ ## Sparsification
58
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
59
+
60
+ Install [SparseML](https://github.com/neuralmagic/sparseml):
61
+ ```bash
62
+ git clone https://github.com/neuralmagic/sparseml
63
+ pip install -e "sparseml[transformers]"
64
+ ```
65
+
66
+ Replace the recipe as you like and run this one-shot compression script to apply SparseGPT:
67
+ ```python
68
+ import sparseml.transformers
69
+
70
+ original_model_name = "teknium/OpenHermes-2.5-Mistral-7B"
71
+ calibration_dataset = "open_platypus"
72
+ output_directory = "output/"
73
+
74
+ recipe = """
75
+ test_stage:
76
+ obcq_modifiers:
77
+ SparseGPTModifier:
78
+ sparsity: 0.5
79
+ sequential_update: true
80
+ targets: ['re:model.layers.\d*$']
81
  """
82
 
83
+ # Apply SparseGPT to the model
84
+ sparseml.transformers.oneshot(
85
+ model=original_model_name,
86
+ dataset=calibration_dataset,
87
+ recipe=recipe,
88
+ output_dir=output_directory,
89
+ )
90
+ ```
91
+
92
+ ## Slack
93
 
94
+ For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)