Update README.md
Browse files
README.md
CHANGED
@@ -79,7 +79,7 @@ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
|
|
79 |
In order to run the inference with Llama 3.1 405B Instruct AWQ in INT4, both `torch` and `autoawq` need to be installed as:
|
80 |
|
81 |
```bash
|
82 |
-
pip install "torch>=2.2.0,<2.3.0" autoawq --upgrade
|
83 |
```
|
84 |
|
85 |
Then, the latest version of `transformers` need to be installed, being 4.43.0 or higher, as:
|
@@ -107,7 +107,6 @@ model = AutoAWQForCausalLM.from_pretrained(
|
|
107 |
torch_dtype=torch.float16,
|
108 |
low_cpu_mem_usage=True,
|
109 |
device_map="auto",
|
110 |
-
fuse_layers=True,
|
111 |
)
|
112 |
|
113 |
inputs = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True).to('cuda')
|
|
|
79 |
In order to run the inference with Llama 3.1 405B Instruct AWQ in INT4, both `torch` and `autoawq` need to be installed as:
|
80 |
|
81 |
```bash
|
82 |
+
pip install "torch>=2.2.0,<2.3.0" torchvision autoawq --upgrade
|
83 |
```
|
84 |
|
85 |
Then, the latest version of `transformers` need to be installed, being 4.43.0 or higher, as:
|
|
|
107 |
torch_dtype=torch.float16,
|
108 |
low_cpu_mem_usage=True,
|
109 |
device_map="auto",
|
|
|
110 |
)
|
111 |
|
112 |
inputs = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True).to('cuda')
|