mwitiderrick commited on
Commit
04a38de
1 Parent(s): c319b00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -35
README.md CHANGED
@@ -1,45 +1,83 @@
1
  ---
2
- base_model: GeneZC/MiniChat-2-3B
3
  inference: True
4
  model_type: Llama
 
 
 
5
  ---
6
- # Nous-Hermes-2-SOLAR-10.7B
7
- This repo contains pruned model files for [Nous-Hermes-2-SOLAR-10.7B](https://huggingface.co/NousResearch/Nous-Hermes-2-SOLAR-10.7B).
8
 
9
  This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
 
 
 
 
 
 
 
10
  ```python
11
- import torch
12
- from transformers import AutoTokenizer, AutoModelForCausalLM
 
13
  prompt = "How to make banana bread?"
14
- formatted_prompt = f"### User:\n{prompt}\n\n### Assistant:\n"
15
- model_id = "nm-testing/Nous-Hermes-2-SOLAR-10.7B-pruned50-24"
16
- model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
17
- tokenizer = AutoTokenizer.from_pretrained(model_id)
18
- inputs = tokenizer(formatted_prompt, return_tensors="pt")
19
- outputs = model.generate(**inputs, max_new_tokens=200)
20
- print(tokenizer.batch_decode(outputs)[0])
 
 
 
 
 
21
  """
22
- <s> ### User:
23
- How to make banana bread?
24
-
25
- ### Assistant:
26
- To make banana bread, you will need the following ingredients:
27
-
28
- - 1 cup of flour
29
- - 1 cup of sugar
30
- - 3 cups of bananas
31
- - 1 cup of milk
32
- - 1 cup of eggs
33
- - 1 cup of baking powder
34
-
35
- Instructions:
36
-
37
- 1. Mix the flour and sugar together.
38
- 2. Add the bananas to the mixture.
39
- 3. Add the milk and eggs to the mixture.
40
- 4. Add the baking powder to the mixture.
41
- 5. Mix all the ingredients together.
42
- 6. Pour the mixture into a baking pan.
43
- 7. Bake the mixture for 30 minutes at 350 degrees.<|im_end|>
 
 
 
 
 
 
 
 
 
 
44
  """
45
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: NousResearch/Nous-Hermes-2-SOLAR-10.7B
3
  inference: True
4
  model_type: Llama
5
+ tags:
6
+ - nm-vllm
7
+ - sparse
8
  ---
9
+ ## Nous-Hermes-2-SOLAR-10.7B-pruned2.4
10
+ This repo contains model files for [Nous Hermes 2 - Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) optimized for [NM-vLLM](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
11
 
12
  This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
13
+
14
+ ## Inference
15
+ Install [NM-vLLM](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
16
+ ```bash
17
+ pip install nm-vllm[sparse]
18
+ ```
19
+ Run in a Python pipeline for local inference:
20
  ```python
21
+ from vllm import LLM, SamplingParams
22
+
23
+ model = LLM("Nous-Hermes-2-SOLAR-10.7B-pruned2.4", sparsity="sparse_w16a16")
24
  prompt = "How to make banana bread?"
25
+ formatted_prompt = f"<|im_start|>User:{prompt}\n<|im_start|>assistant:\n"
26
+
27
+ sampling_params = SamplingParams(max_tokens=100)
28
+ outputs = model.generate(formatted_prompt, sampling_params=sampling_params)
29
+ print(outputs[0].outputs[0].text)
30
+ """
31
+ To make banana bread, you will need to follow these steps:
32
+
33
+ 1. Gather the ingredients needed for the recipe: flour, sugar, eggs, baking powder, and banana bread.
34
+ 2. Preparing the bread by kneading the flour, butter, and sugar in a bowl until it becomes soft.
35
+ 3. Then, knead the bread with the flour eggs and sugar then bake them in a microwave oven.
36
+ 4. Once done baking a
37
  """
38
+ ```
39
+
40
+ ## Prompt template
41
+
42
+ ```
43
+ ### User:\n{prompt}\n\n### Assistant:\n"
44
+ ```
45
+
46
+ ## Sparsification
47
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
48
+
49
+ Install [SparseML](https://github.com/neuralmagic/sparseml):
50
+ ```bash
51
+ git clone https://github.com/neuralmagic/sparseml
52
+ pip install -e "sparseml[transformers]"
53
+ ```
54
+
55
+ Replace the recipe as you like and run this one-shot compression script to apply SparseGPT:
56
+ ```python
57
+ import sparseml.transformers
58
+
59
+ original_model_name = "NousResearch/Nous-Hermes-2-SOLAR-10.7B"
60
+ calibration_dataset = "open_platypus"
61
+ output_directory = "output/"
62
+
63
+ recipe = """
64
+ test_stage:
65
+ obcq_modifiers:
66
+ SparseGPTModifier:
67
+ sparsity: 0.5
68
+ sequential_update: true
69
+ targets: ['re:model.layers.\d*$']
70
  """
71
+
72
+ # Apply SparseGPT to the model
73
+ sparseml.transformers.oneshot(
74
+ model=original_model_name,
75
+ dataset=calibration_dataset,
76
+ recipe=recipe,
77
+ output_dir=output_directory,
78
+ )
79
+ ```
80
+
81
+ ## Slack
82
+
83
+ For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)