SearchUnify-ML commited on
Commit
8011cbf
1 Parent(s): d31444e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -1
README.md CHANGED
@@ -16,14 +16,65 @@ It is the result of quantising to 4bit using GPTQ-for-LLaMa.
16
  The model is open for COMMERCIAL USE.
17
 
18
 
19
- ## How to use this GPTQ model from Python code
20
 
21
  First, make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
22
 
23
  #### pip install auto-gptq
24
 
25
 
 
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
 
29
 
 
16
  The model is open for COMMERCIAL USE.
17
 
18
 
19
+ # How to use this GPTQ model from Python code
20
 
21
  First, make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
22
 
23
  #### pip install auto-gptq
24
 
25
 
26
+ <code>
27
 
28
+ from transformers import AutoTokenizer, pipeline, logging
29
+ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
30
+ import argparse
31
+
32
+ model_name_or_path = "SearchUnify-ML/xgen-7b-8k-open-instruct-gptq"
33
+ model_basename = "gptq_model-4bit-128g"
34
+
35
+ use_triton = False
36
+
37
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
38
+
39
+ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
40
+ model_basename=model_basename,
41
+ use_safetensors=True,
42
+ trust_remote_code=False,
43
+ device="cuda:0",
44
+ use_triton=use_triton,
45
+ quantize_config=None)
46
+
47
+ # Note: check the prompt template is correct for this model.
48
+ prompt = "Tell me about AI"
49
+ prompt_template=f'''### Instruction: {prompt}
50
+ ### Response:'''
51
+
52
+ print("\n\n*** Generate:")
53
+
54
+ input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
55
+ output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
56
+ print(tokenizer.decode(output[0]))
57
+
58
+ # Inference can also be done using transformers' pipeline
59
+
60
+ # Prevent printing spurious transformers error when using pipeline with AutoGPTQ
61
+ logging.set_verbosity(logging.CRITICAL)
62
+
63
+ print("*** Pipeline:")
64
+ pipe = pipeline(
65
+ "text-generation",
66
+ model=model,
67
+ tokenizer=tokenizer,
68
+ max_new_tokens=1024,
69
+ temperature=0.3,
70
+ top_p=0.95,
71
+ repetition_penalty=1.15
72
+ )
73
+
74
+ print(pipe(prompt_template)[0]['generated_text'])
75
+
76
+
77
+ </code>
78
 
79
 
80