seonglae
/

llama-2-13b-chat-hf-gptq

Text Generation

text-generation-inference

Model card Files Files and versions Community

seonglae commited on Jul 19, 2023

Commit

ff3c993

•

1 Parent(s): 0f80148

Create README.md

Files changed (1) hide show

README.md +43 -0

README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+inference: false
+license: other
+tags:
+- llama-2
+- llama2
+- gptq
+- auto-gptq
+- 13b
+- llama
+- 4bit
+---
+# Get Started
+This model should use [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) so you need to use `auto-gptq`
+```py
+from transformers import AutoTokenizer, pipeline, LlamaForCausalLM, LlamaTokenizer
+from auto_gptq import AutoGPTQForCausalLM
+model_id = 'seonglae/llama-2-13b-chat-hf-gptq'
+tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
+model = AutoGPTQForCausalLM.from_quantized(
+        model_id,
+        model_basename=model_basename,
+        trust_remote_code=True,
+        device='cuda:0',
+        use_triton=False,
+        use_safetensors=True,
+)
+pipe = pipeline(
+      "text-generation",
+      model=model,
+      tokenizer=tokenizer,
+      temperature=0.5,
+      top_p=0.95,
+      max_new_tokens=100,
+      repetition_penalty=1.15,
+)
+prompt = "USER: Are you AI?\nASSISTANT:"
+pipe(prompt)
+```