mwitiderrick commited on
Commit
4e2bc8e
1 Parent(s): 5a96281

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -14
README.md CHANGED
@@ -2,30 +2,84 @@
2
  base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
3
  inference: True
4
  model_type: Llama
 
 
 
5
  ---
6
- # TinyLlama-1.1B-Chat-v1.0
7
- This repo contains pruned model files for [TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0).
8
 
9
  This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
 
 
 
 
 
 
 
10
  ```python
 
11
 
12
- import torch
13
- from transformers import AutoTokenizer, AutoModelForCausalLM
14
  prompt = "How to make banana bread?"
15
  formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
16
- model_id = "nm-testing/TinyLlama-1.1B-Chat-v1.0-pruned50-24"
17
- model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
18
- tokenizer = AutoTokenizer.from_pretrained(model_id)
19
- inputs = tokenizer(formatted_prompt, return_tensors="pt")
20
- outputs = model.generate(**inputs, max_new_tokens=200)
21
- print(tokenizer.batch_decode(outputs)[0])
22
 
 
 
 
23
  """
24
- <s> <|im_start|>user
25
- How to make banana bread?<|im_end|>
26
- <|im_start|>assistant
27
  Banana bread is a delicious dessert that is made with bananas. Here is how to make banana bread:
28
 
29
  1. Firstly, you need to cut bananas into small pieces.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  """
31
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
3
  inference: True
4
  model_type: Llama
5
+ tags:
6
+ - nm-vllm
7
+ - sparse
8
  ---
9
+ ## TinyLlama-1.1B-Chat-v1.0-pruned2.4
10
+ This repo contains model files for [TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) optimized for [NM-vLLM](https://github.com/neuralmagic/nm-vllm), a high-throughput serving engine for compressed LLMs.
11
 
12
  This model was pruned with [SparseGPT](https://arxiv.org/abs/2301.00774), using [SparseML](https://github.com/neuralmagic/sparseml).
13
+
14
+ ## Inference
15
+ Install [NM-vLLM](https://github.com/neuralmagic/nm-vllm) for fast inference and low memory-usage:
16
+ ```bash
17
+ pip install nm-vllm[sparse]
18
+ ```
19
+ Run in a Python pipeline for local inference:
20
  ```python
21
+ from vllm import LLM, SamplingParams
22
 
23
+ model = LLM("nm-testing/TinyLlama-1.1B-Chat-v1.0-pruned2.4", sparsity="sparse_w16a16")
 
24
  prompt = "How to make banana bread?"
25
  formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
 
 
 
 
 
 
26
 
27
+ sampling_params = SamplingParams(max_tokens=100,temperature=0,repetition_penalty=1.3)
28
+ outputs = model.generate(formatted_prompt, sampling_params=sampling_params)
29
+ print(outputs[0].outputs[0].text)
30
  """
 
 
 
31
  Banana bread is a delicious dessert that is made with bananas. Here is how to make banana bread:
32
 
33
  1. Firstly, you need to cut bananas into small pieces.
34
+ 2. Then, you need to slice the bananas into small pieces
35
+ """
36
+ ```
37
+
38
+ ## Prompt template
39
+
40
+ ```
41
+ <|im_start|>user
42
+ {prompt}<|im_end|>
43
+ <|im_start|>assistant
44
+
45
+ ```
46
+
47
+ ## Sparsification
48
+ For details on how this model was sparsified, see the `recipe.yaml` in this repo and follow the instructions below.
49
+
50
+ Install [SparseML](https://github.com/neuralmagic/sparseml):
51
+ ```bash
52
+ git clone https://github.com/neuralmagic/sparseml
53
+ pip install -e "sparseml[transformers]"
54
+ ```
55
+
56
+ Replace the recipe as you like and run this one-shot compression script to apply SparseGPT:
57
+ ```python
58
+ import sparseml.transformers
59
+
60
+ original_model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
61
+ calibration_dataset = "open_platypus"
62
+ output_directory = "output/"
63
+
64
+ recipe = """
65
+ test_stage:
66
+ obcq_modifiers:
67
+ SparseGPTModifier:
68
+ sparsity: 0.5
69
+ sequential_update: true
70
+ mask_structure: '2:4'
71
+ targets: ['re:model.layers.\d*$']
72
  """
73
+
74
+ # Apply SparseGPT to the model
75
+ sparseml.transformers.oneshot(
76
+ model=original_model_name,
77
+ dataset=calibration_dataset,
78
+ recipe=recipe,
79
+ output_dir=output_directory,
80
+ )
81
+ ```
82
+
83
+ ## Slack
84
+
85
+ For further support, and discussions on these models and AI in general, join [Neural Magic's Slack Community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)