mwitiderrick commited on
Commit
43a4e1a
1 Parent(s): 12cc552

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: NousResearch/Nous-Hermes-2-Yi-34B
3
+ inference: true
4
+ model_type: llama
5
+ quantized_by: mgoin
6
+ tags:
7
+ - nm-vllm
8
+ - sparse
9
+ ---
10
+
11
+ ## Nous-Hermes-2-Yi-34B-pruned50
12
+ This repo contains model files for [Nous Hermes 2 - Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) optimized for [NM-vLLM](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
13
+
14
+ This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
15
+
16
+ ## Inference
17
+ Install [NM-vLLM](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
18
+ ```bash
19
+ pip install nm-vllm[sparse]
20
+ ```
21
+ Run in a Python pipeline for local inference:
22
+ ```python
23
+ from vllm import LLM, SamplingParams
24
+
25
+ model = LLM("nm-testing/Nous-Hermes-2-Yi-34B-pruned2.4", sparsity="sparse_w16a16")
26
+ prompt = "How to make banana bread?"
27
+ formatted_prompt = f"<|im_start|>User:{prompt}\n<|im_start|>assistant:\n"
28
+
29
+ sampling_params = SamplingParams(max_tokens=100, temperature=0)
30
+ outputs = model.generate(formatted_prompt, sampling_params=sampling_params)
31
+ print(outputs[0].outputs[0].text)
32
+ """
33
+ To make banana bread, you will need the following ingredients:
34
+
35
+ Ingredients:
36
+ - 2 ripe bananas
37
+ - 1 cup all-purpose flour
38
+ - 1/2 cup sugar
39
+ - 1/2 cup butter
40
+ - 1 teaspoon baking soda
41
+ - 1 teaspoon baking powder
42
+ - 1/2 teaspoon salt
43
+ - 1/2 cup milk
44
+ - 1 teaspoon vanilla extract
45
+
46
+ Instructions:
47
+ 1. Preheat the oven to 3
48
+ """
49
+ ```
50
+
51
+ ## Prompt template
52
+
53
+ ```
54
+ <|im_start|>User:{prompt}\n<|im_start|>assistant:\n
55
+ ```
56
+
57
+ ## Sparsification
58
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
59
+
60
+ Install [SparseML](https://github.com/neuralmagic/sparseml):
61
+ ```bash
62
+ git clone https://github.com/neuralmagic/sparseml
63
+ pip install -e "sparseml[transformers]"
64
+ ```
65
+
66
+ Replace the recipe as you like and run this one-shot compression script to apply SparseGPT:
67
+ ```python
68
+ import sparseml.transformers
69
+
70
+ original_model_name = "NousResearch/Nous-Hermes-2-Yi-34B"
71
+ calibration_dataset = "open_platypus"
72
+ output_directory = "output/"
73
+
74
+ recipe = """
75
+ test_stage:
76
+ obcq_modifiers:
77
+ SparseGPTModifier:
78
+ sparsity: 0.5
79
+ sequential_update: true
80
+ targets: ['re:model.layers.\d*$']
81
+ """
82
+
83
+ # Apply SparseGPT to the model
84
+ sparseml.transformers.oneshot(
85
+ model=original_model_name,
86
+ dataset=calibration_dataset,
87
+ recipe=recipe,
88
+ output_dir=output_directory,
89
+ )
90
+ ```
91
+
92
+ ## Slack
93
+
94
+ For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)