kiranr
/

internlm2-chat-20b-llama

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

kiranr commited on Jan 19

Commit

be3e83c

•

1 Parent(s): d8c7aa3

Create README.md

Files changed (1) hide show

README.md +37 -0

README.md ADDED Viewed

	@@ -0,0 +1,37 @@

+# internlm2-chat-20b-llama
+[`internlm/internlm2-20b`](https://huggingface.co/internlm/internlm2-20b) weights are formatted to match standard Llama modeling code.
+Model can be loaded directly, but for tokenizer use `trust_remote_code`
+# usage:
+```py
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_name = "kiranr/internlm2-chat-20b-llama"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.float16,
+    device_map="auto",
+    attn_implementation="flash_attention_2",
+)
+messages = [
+    {"role": "user", "content": "what is the square root of banana?"}
+]
+model_input = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
+generated_ids = model.generate(
+    model_input,
+    max_new_tokens=1024,
+    do_sample=True,
+    eos_token_id=[92542, 2],  # <|im_end|> and </s>
+)
+output = tokenizer.decode(
+    generated_ids[0][model_input.shape[-1] : -1], skip_special_tokens=True
+)
+print(output)
+```