alvarobartt HF staff commited on
Commit
cdc7e6e
β€’
1 Parent(s): 2fab931

Update README.md (#3)

Browse files

- Update README.md (16847c9db499983c1362cff97f5807d75f07aa34)

Files changed (1) hide show
  1. README.md +12 -56
README.md CHANGED
@@ -35,25 +35,11 @@ In order to use the current quantized model, support is offered for different so
35
 
36
  ### πŸ€— transformers
37
 
38
- In order to run the inference with Llama 3.1 405B Instruct GPTQ in INT4, both `torch` and `autogptq` need to be installed as:
39
 
40
  ```bash
41
- pip install "torch>=2.2.0,<2.3.0" --upgrade
42
- pip install auto-gptq --no-build-isolation
43
- ```
44
-
45
- Otherwise, running the model may fail, since the AutoGPTQ kernels are built with PyTorch 2.2.1, meaning that those will break with PyTorch 2.3.0.
46
-
47
- Then, the latest version of `transformers` need to be installed including the `accelerate` extra, being 4.43.0 or higher, as:
48
-
49
- ```bash
50
- pip install "transformers[accelerate]>=4.43.0" --upgrade
51
- ```
52
-
53
- Finally, in order to use `autogptq`, `optimum` also needs to be installed:
54
-
55
- ```bash
56
- pip install optimum --upgrade
57
  ```
58
 
59
  To run the inference on top of Llama 3.1 405B Instruct GPTQ in INT4 precision, the GPTQ model can be instantiated as any other causal language modeling model via `AutoModelForCausalLM` and run the inference normally.
@@ -91,30 +77,14 @@ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
91
 
92
  ### AutoGPTQ
93
 
94
- Alternatively, one may want to run that via `AutoGPTQ` even though it's built on top of πŸ€— `transformers`, which is the recommended approach instead as described above.
95
-
96
- In order to run the inference with Llama 3.1 405B Instruct GPTQ in INT4, both `torch` and `autogptq` need to be installed as:
97
-
98
- ```bash
99
- pip install "torch>=2.2.0,<2.3.0" --upgrade
100
- pip install auto-gptq --no-build-isolation
101
- ```
102
-
103
- Otherwise, running the model may fail, since the AutoGPTQ kernels are built with PyTorch 2.2.1, meaning that those will break with PyTorch 2.3.0.
104
-
105
- Then, the latest version of `transformers` need to be installed including the `accelerate` extra, being 4.43.0 or higher, as:
106
-
107
- ```bash
108
- pip install "transformers[accelerate]>=4.43.0" --upgrade
109
- ```
110
-
111
- Finally, in order to use `autogptq`, `optimum` also needs to be installed:
112
 
113
  ```bash
114
- pip install optimum --upgrade
 
115
  ```
116
 
117
- And then run it as follows:
118
 
119
  ```python
120
  import torch
@@ -148,7 +118,7 @@ outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
148
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
149
  ```
150
 
151
- The AutoGPTQ script has been adapted from [AutoGPTQ/examples/quantization/basic_usage.py](https://github.com/AutoGPTQ/AutoGPTQ/blob/main/examples/quantization/basic_usage.py).
152
 
153
  ### πŸ€— Text Generation Inference (TGI)
154
 
@@ -159,28 +129,14 @@ Coming soon!
159
  > [!NOTE]
160
  > In order to quantize Llama 3.1 405B Instruct using AutoGPTQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~800GiB, and an NVIDIA GPU with 80GiB of VRAM to quantize it.
161
 
162
- In order to quantize Llama 3.1 405B Instruct, first install `torch` and `autoqptq` as follows:
163
-
164
- ```bash
165
- pip install "torch>=2.2.0,<2.3.0" --upgrade
166
- pip install auto-gptq --no-build-isolation
167
- ```
168
-
169
- Otherwise the quantization may fail, since the AutoGPTQ kernels are built with PyTorch 2.2.1, meaning that those will break with PyTorch 2.3.0.
170
-
171
- Then install the latest version of `transformers` as follows:
172
-
173
- ```bash
174
- pip install "transformers>=4.43.0" --upgrade
175
- ```
176
-
177
- Finally, in order to use `autogptq`, `optimum` also needs to be installed:
178
 
179
  ```bash
180
- pip install optimum --upgrade
 
181
  ```
182
 
183
- And then, run the following script, adapted from [AutoGPTQ/examples/quantization/basic_usage.py](https://github.com/AutoGPTQ/AutoGPTQ/blob/main/examples/quantization/basic_usage.py).
184
 
185
  ```python
186
  import random
 
35
 
36
  ### πŸ€— transformers
37
 
38
+ In order to run the inference with Llama 3.1 405B Instruct GPTQ in INT4, you need to install the following packages:
39
 
40
  ```bash
41
+ pip install -q --upgrade transformers accelerate optimum
42
+ pip install -q --no-build-isolation auto-gptq
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ```
44
 
45
  To run the inference on top of Llama 3.1 405B Instruct GPTQ in INT4 precision, the GPTQ model can be instantiated as any other causal language modeling model via `AutoModelForCausalLM` and run the inference normally.
 
77
 
78
  ### AutoGPTQ
79
 
80
+ In order to run the inference with Llama 3.1 405B Instruct GPTQ in INT4, you need to install the following packages:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
  ```bash
83
+ pip install -q --upgrade transformers accelerate optimum
84
+ pip install -q --no-build-isolation auto-gptq
85
  ```
86
 
87
+ Alternatively, one may want to run that via `AutoGPTQ` even though it's built on top of πŸ€— `transformers`, which is the recommended approach instead as described above.
88
 
89
  ```python
90
  import torch
 
118
  print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
119
  ```
120
 
121
+ The AutoGPTQ script has been adapted from [`AutoGPTQ/examples/quantization/basic_usage.py`](https://github.com/AutoGPTQ/AutoGPTQ/blob/main/examples/quantization/basic_usage.py).
122
 
123
  ### πŸ€— Text Generation Inference (TGI)
124
 
 
129
  > [!NOTE]
130
  > In order to quantize Llama 3.1 405B Instruct using AutoGPTQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~800GiB, and an NVIDIA GPU with 80GiB of VRAM to quantize it.
131
 
132
+ In order to quantize Llama 3.1 405B Instruct with GPTQ in INT4, you need to install the following packages:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
 
134
  ```bash
135
+ pip install -q --upgrade transformers accelerate optimum
136
+ pip install -q --no-build-isolation auto-gptq
137
  ```
138
 
139
+ Then run the following script, adapted from [`AutoGPTQ/examples/quantization/basic_usage.py`](https://github.com/AutoGPTQ/AutoGPTQ/blob/main/examples/quantization/basic_usage.py).
140
 
141
  ```python
142
  import random