kiranr commited on
Commit
be3e83c
1 Parent(s): d8c7aa3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # internlm2-chat-20b-llama
2
+
3
+ [`internlm/internlm2-20b`](https://huggingface.co/internlm/internlm2-20b) weights are formatted to match standard Llama modeling code.
4
+ Model can be loaded directly, but for tokenizer use `trust_remote_code`
5
+
6
+ # usage:
7
+ ```py
8
+ import torch
9
+ from transformers import AutoTokenizer, AutoModelForCausalLM
10
+
11
+ model_name = "kiranr/internlm2-chat-20b-llama"
12
+
13
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
14
+
15
+ model = AutoModelForCausalLM.from_pretrained(
16
+ model_name,
17
+ torch_dtype=torch.float16,
18
+ device_map="auto",
19
+ attn_implementation="flash_attention_2",
20
+ )
21
+ messages = [
22
+ {"role": "user", "content": "what is the square root of banana?"}
23
+ ]
24
+
25
+ model_input = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
26
+
27
+ generated_ids = model.generate(
28
+ model_input,
29
+ max_new_tokens=1024,
30
+ do_sample=True,
31
+ eos_token_id=[92542, 2], # <|im_end|> and </s>
32
+ )
33
+ output = tokenizer.decode(
34
+ generated_ids[0][model_input.shape[-1] : -1], skip_special_tokens=True
35
+ )
36
+ print(output)
37
+ ```