Edit model card

internlm2-chat-20b-llama

internlm/internlm2-chat-20b weights formatted to match standard Llama modeling code. Model can be loaded directly, but for tokenizer use trust_remote_code

usage:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "kiranr/internlm2-chat-20b-llama"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    attn_implementation="flash_attention_2",
)
messages = [
    {"role": "user", "content": "what is the square root of banana?"}
]

model_input = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")

generated_ids = model.generate(
    model_input,
    max_new_tokens=1024,
    do_sample=True,
    eos_token_id=[92542, 2],  # <|im_end|> and </s>
)
output = tokenizer.decode(
    generated_ids[0][model_input.shape[-1] : -1], skip_special_tokens=True
)
print(output)
Downloads last month
17
Safetensors
Model size
19.9B params
Tensor type
FP16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for kiranr/internlm2-chat-20b-llama

Finetuned
this model