kaitchup
/

Llama-2-7b-hf-gptq-3bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bnjmnmarie commited on Jan 20

Commit

d994b72

•

1 Parent(s): b221737

Update README.md

Files changed (1) hide show

README.md +16 -1

README.md CHANGED Viewed

@@ -1,3 +1,18 @@
 ---
 license: mit
----

 ---
 license: mit
+---
+Llama 2 7B quantized in 3-bit with GPTQ.
+```
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from optimum.gptq import GPTQQuantizer
+import torch
+w = 3
+model_path = meta-llama/Llama-2-7b-hf
+tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
+model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
+quantizer = GPTQQuantizer(bits=w, dataset="c4", model_seqlen = 4096)
+quantized_model = quantizer.quantize_model(model, tokenizer)
+```