mwitiderrick commited on
Commit
4dbabbc
1 Parent(s): f482083

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -37
README.md CHANGED
@@ -1,48 +1,90 @@
1
  ---
2
- base_model: GeneZC/MiniChat-2-3B
3
- inference: True
4
- model_type: Llama
 
 
 
 
5
  ---
6
- # Nous-Hermes-2-Yi-34B
7
- This repo contains pruned model files for [Nous-Hermes-2-Yi-34B ](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B).
 
8
 
9
  This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
 
 
 
 
 
 
 
10
  ```python
11
- import torch
12
- from transformers import AutoTokenizer, AutoModelForCausalLM
 
13
  prompt = "How to make banana bread?"
14
  formatted_prompt = f"<|im_start|>User:{prompt}\n<|im_start|>assistant:\n"
15
- model_id = "nm-testing/Nous-Hermes-2-Yi-34B-pruned50-24"
16
- model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
17
- tokenizer = AutoTokenizer.from_pretrained(model_id)
18
- inputs = tokenizer(formatted_prompt, return_tensors="pt")
19
- outputs = model.generate(**inputs, max_new_tokens=200)
20
- print(tokenizer.batch_decode(outputs)[0])
21
- """
22
- <|im_start|> User:How to make banana bread?
23
- <|im_start|> assistant:
24
- To make banana bread, you can follow these steps:
25
 
26
- Ingredients:
 
 
 
 
 
27
  - 2 ripe bananas
28
- - 2 cups flour
29
- - 1/2 cup sugar
30
- - 1/2 cup butter
31
- - 1/2 cup milk
32
- - 1 teaspoon baking powder
33
- - 1 teaspoon baking soda
34
- - 1 teaspoon salt
35
-
36
- Instructions:
37
- 1. Preheat the oven to 350°F (175°C).
38
- 2. In a mixing bowl, mash the bananas and mix them with the flour, sugar, butter, milk, baking powder, baking soda, and salt.
39
- 3. Mix the ingredients until they form a dough.
40
- 4. Pour the dough into a baking pan.
41
- 5. Bake the banana bread for 30 minutes.
42
- 6. Remove the banana bread from the oven and let it cool.
43
- 7. Enjoy your banana bread.
44
-
45
- Note: You can adjust the ingredients
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  """
47
 
48
- ```
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: NousResearch/Nous-Hermes-2-Yi-34B
3
+ inference: true
4
+ model_type: llama
5
+ quantized_by: mgoin
6
+ tags:
7
+ - nm-vllm
8
+ - sparse
9
  ---
10
+
11
+ ## llama2.c-stories110M-pruned50
12
+ This repo contains model files for [Nous Hermes 2 - Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) optimized for [NM-vLLM](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
13
 
14
  This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
15
+
16
+ ## Inference
17
+ Install [NM-vLLM](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
18
+ ```bash
19
+ pip install nm-vllm[sparse]
20
+ ```
21
+ Run in a Python pipeline for local inference:
22
  ```python
23
+ from vllm import LLM, SamplingParams
24
+
25
+ model = LLM("nm-testing/Nous-Hermes-2-Yi-34B-pruned2.4", sparsity="sparse_w16a16")
26
  prompt = "How to make banana bread?"
27
  formatted_prompt = f"<|im_start|>User:{prompt}\n<|im_start|>assistant:\n"
 
 
 
 
 
 
 
 
 
 
28
 
29
+ sampling_params = SamplingParams(max_tokens=100, temperature=0)
30
+ outputs = model.generate(formatted_prompt, sampling_params=sampling_params)
31
+ print(outputs[0].outputs[0].text)
32
+ """
33
+ To make banana bread, follow these steps:
34
+ 1. Gather the ingredients:
35
  - 2 ripe bananas
36
+ - 2 cups of flour
37
+ - 1 teaspoon of baking powder
38
+ - 1 teaspoon of salt
39
+ - 1 teaspoon of sugar
40
+ - 1 teaspoon of vanilla extract
41
+ 2. Preheat the oven to 350°F.
42
+ 3. In a mixing bowl, combine the flour, baking powder, salt, sugar, and vanilla extract.
43
+ 4.
44
+ """
45
+ ```
46
+
47
+ ## Prompt template
48
+
49
+ ```
50
+ <|im_start|>User:{prompt}\n<|im_start|>assistant:\n
51
+ ```
52
+
53
+ ## Sparsification
54
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
55
+
56
+ Install [SparseML](https://github.com/neuralmagic/sparseml):
57
+ ```bash
58
+ git clone https://github.com/neuralmagic/sparseml
59
+ pip install -e "sparseml[transformers]"
60
+ ```
61
+
62
+ Replace the recipe as you like and run this one-shot compression script to apply SparseGPT:
63
+ ```python
64
+ import sparseml.transformers
65
+
66
+ original_model_name = "NousResearch/Nous-Hermes-2-Yi-34B"
67
+ calibration_dataset = "open_platypus"
68
+ output_directory = "output/"
69
+
70
+ recipe = """
71
+ test_stage:
72
+ obcq_modifiers:
73
+ SparseGPTModifier:
74
+ sparsity: 0.5
75
+ sequential_update: true
76
+ targets: ['re:model.layers.\d*$']
77
  """
78
 
79
+ # Apply SparseGPT to the model
80
+ sparseml.transformers.oneshot(
81
+ model=original_model_name,
82
+ dataset=calibration_dataset,
83
+ recipe=recipe,
84
+ output_dir=output_directory,
85
+ )
86
+ ```
87
+
88
+ ## Slack
89
+
90
+ For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)