|
--- |
|
base_model: internlm/internlm2-chat-20b |
|
--- |
|
# internlm2-chat-20b-llama |
|
|
|
[`internlm/internlm2-chat-20b`](https://huggingface.co/internlm/internlm2-chat-20b) weights formatted to match standard Llama modeling code. |
|
Model can be loaded directly, but for tokenizer use `trust_remote_code` |
|
|
|
# usage: |
|
```py |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_name = "kiranr/internlm2-chat-20b-llama" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
attn_implementation="flash_attention_2", |
|
) |
|
messages = [ |
|
{"role": "user", "content": "what is the square root of banana?"} |
|
] |
|
|
|
model_input = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda") |
|
|
|
generated_ids = model.generate( |
|
model_input, |
|
max_new_tokens=1024, |
|
do_sample=True, |
|
eos_token_id=[92542, 2], # <|im_end|> and </s> |
|
) |
|
output = tokenizer.decode( |
|
generated_ids[0][model_input.shape[-1] : -1], skip_special_tokens=True |
|
) |
|
print(output) |
|
``` |