Text Generation
Transformers
llama
Inference Endpoints
text-generation-inference
GodRain commited on
Commit
5e6a82d
1 Parent(s): 6dd282b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -1,3 +1,59 @@
1
  ---
2
  license: bigcode-openrail-m
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: bigcode-openrail-m
3
+ datasets:
4
+ - WizardLM/WizardLM_evol_instruct_70k
5
+ language:
6
+ - en
7
  ---
8
+
9
+ <font size=5>Here is an example to show how to use model quantized by auto_gptq</font>
10
+ ```
11
+ _3BITS_MODEL_PATH_V1_ = 'GodRain/WizardCoder-15B-V1.1-3bit'
12
+
13
+ # pip install auto_gptq
14
+ from auto_gptq import AutoGPTQForCausalLM
15
+ from transformers import AutoTokenizer
16
+
17
+ tokenizer = AutoTokenizer.from_pretrained(_3BITS_MODEL_PATH_V1_)
18
+ model = AutoGPTQForCausalLM.from_quantized(_3BITS_MODEL_PATH_V1_)
19
+
20
+ out = evaluate("Hello, tell me a story about sun", model=model, tokenizer=tokenizer)
21
+ print(out[0].strip())
22
+ ```
23
+
24
+ ```
25
+ def evaluate(
26
+ batch_data,
27
+ tokenizer,
28
+ model,
29
+ temperature=1,
30
+ top_p=0.9,
31
+ top_k=40,
32
+ num_beams=1,
33
+ max_new_tokens=2048,
34
+ **kwargs,
35
+ ):
36
+ prompts = generate_prompt(batch_data)
37
+ inputs = tokenizer(prompts, return_tensors="pt", max_length=256, truncation=True)
38
+ input_ids = inputs["input_ids"].to(device)
39
+ generation_config = GenerationConfig(
40
+ temperature=temperature,
41
+ top_p=top_p,
42
+ top_k=top_k,
43
+ num_beams=num_beams,
44
+ eos_token_id=tokenizer.eos_token_id,
45
+ pad_token_id=tokenizer.pad_token_id,
46
+ **kwargs,
47
+ )
48
+ with torch.no_grad():
49
+ generation_output = model.generate(
50
+ input_ids=input_ids,
51
+ generation_config=generation_config,
52
+ return_dict_in_generate=True,
53
+ output_scores=True,
54
+ max_new_tokens=max_new_tokens,
55
+ )
56
+ s = generation_output.sequences
57
+ output = tokenizer.batch_decode(s, skip_special_tokens=True)
58
+ return output
59
+ ```