SearchUnify-ML
/

xgen-7b-8k-open-instruct-gptq

Text Generation

Model card Files Files and versions Community

SearchUnify-ML commited on Jul 5, 2023

Commit

8011cbf

•

1 Parent(s): d31444e

Update README.md

Files changed (1) hide show

README.md +52 -1

README.md CHANGED Viewed

@@ -16,14 +16,65 @@ It is the result of quantising to 4bit using GPTQ-for-LLaMa.
 The model is open for COMMERCIAL USE.
-## How to use this GPTQ model from Python code
 First, make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
 #### pip install auto-gptq

 The model is open for COMMERCIAL USE.
+# How to use this GPTQ model from Python code
 First, make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
 #### pip install auto-gptq
+<code>
+from transformers import AutoTokenizer, pipeline, logging
+from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
+import argparse
+model_name_or_path = "SearchUnify-ML/xgen-7b-8k-open-instruct-gptq"
+model_basename = "gptq_model-4bit-128g"
+use_triton = False
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
+model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
+        model_basename=model_basename,
+        use_safetensors=True,
+        trust_remote_code=False,
+        device="cuda:0",
+        use_triton=use_triton,
+        quantize_config=None)
+# Note: check the prompt template is correct for this model.
+prompt = "Tell me about AI"
+prompt_template=f'''### Instruction: {prompt}
+### Response:'''
+print("\n\n*** Generate:")
+input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
+output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
+print(tokenizer.decode(output[0]))
+# Inference can also be done using transformers' pipeline
+# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
+logging.set_verbosity(logging.CRITICAL)
+print("*** Pipeline:")
+pipe = pipeline(
+    "text-generation",
+    model=model,
+    tokenizer=tokenizer,
+    max_new_tokens=1024,
+    temperature=0.3,
+    top_p=0.95,
+    repetition_penalty=1.15
+)
+print(pipe(prompt_template)[0]['generated_text'])
+</code>