File size: 1,610 Bytes
d3f780f b8d8a11 cc44d48 d31444e cc44d48 d31444e cc44d48 b10873c cc44d48 8011cbf cc44d48 d31444e cc44d48 30ffb0c cc44d48 30ffb0c cc44d48 b10873c cc44d48 30ffb0c 8011cbf 30ffb0c 8011cbf 30ffb0c 8011cbf 30ffb0c 8011cbf 30ffb0c 8011cbf 30ffb0c cc44d48 30ffb0c cc44d48 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
license: cc
datasets:
- VMware/open-instruct-v1-oasst-dolly-hhrlhf
language:
- en
pipeline_tag: text-generation
---
# SearchUnify-ML/xgen-7b-8k-open-instruct-gptq
These are GPTQ 4bit model files for [VMWare's XGEN 7B 8K Open Instruct](https://huggingface.co/VMware/xgen-7b-8k-open-instruct).
It is the result of quantizing to 4bit using GPTQ-for-LLaMa.
# How to use this GPTQ model from Python code
First, make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
```
pip install auto-gptq
```
Second, install tiktoken in order to use the tokenizer
```
pip install tiktoken
```
```
from transformers import AutoTokenizer, pipeline
from auto_gptq import AutoGPTQForCausalLM
model_name_or_path = "SearchUnify-ML/xgen-7b-8k-open-instruct-gptq"
model_basename = "gptq_model-4bit-128g"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=False, trust_remote_code=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=False,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton)
# Note: check the prompt template is correct for this model.
prompt = "Explain the rules of field hockey to a novice."
prompt_template=f'''### Instruction: {prompt}
### Response:'''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.3, max_new_tokens=512)
print(f"\n\n {tokenizer.decode(output[0]).split('### Response:')[1]}")
```
|