dnhkng commited on
Commit
d7cc043
1 Parent(s): 39a7882

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -1
README.md CHANGED
@@ -2,4 +2,44 @@
2
  license: mit
3
  ---
4
 
5
- This is a new kind of model optimization. A paper is currently be written on the technique.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
4
 
5
+ This is a new kind of model optimization.
6
+
7
+ A paper is currently being written on the technique.
8
+
9
+ ## Quickstart
10
+
11
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
12
+
13
+ ```python
14
+ from transformers import AutoModelForCausalLM, AutoTokenizer
15
+ device = "cuda" # the device to load the model onto
16
+
17
+ model = AutoModelForCausalLM.from_pretrained(
18
+ "dnhkng/Large-bnb-4bit",
19
+ torch_dtype="auto",
20
+ device_map="auto"
21
+ )
22
+ tokenizer = AutoTokenizer.from_pretrained("dnhkng/Large-bnb-4bit")
23
+
24
+ prompt = "Give me a short introduction to large language model."
25
+ messages = [
26
+ {"role": "system", "content": "You are a helpful assistant."},
27
+ {"role": "user", "content": prompt}
28
+ ]
29
+ text = tokenizer.apply_chat_template(
30
+ messages,
31
+ tokenize=False,
32
+ add_generation_prompt=True
33
+ )
34
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
35
+
36
+ generated_ids = model.generate(
37
+ model_inputs.input_ids,
38
+ max_new_tokens=512
39
+ )
40
+ generated_ids = [
41
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
42
+ ]
43
+
44
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
45
+ ```