--- base_model: internlm/internlm2-chat-20b --- # internlm2-chat-20b-llama [`internlm/internlm2-chat-20b`](https://huggingface.co/internlm/internlm2-chat-20b) weights formatted to match standard Llama modeling code. Model can be loaded directly, but for tokenizer use `trust_remote_code` # usage: ```py import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "kiranr/internlm2-chat-20b-llama" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto", attn_implementation="flash_attention_2", ) messages = [ {"role": "user", "content": "what is the square root of banana?"} ] model_input = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda") generated_ids = model.generate( model_input, max_new_tokens=1024, do_sample=True, eos_token_id=[92542, 2], # <|im_end|> and ) output = tokenizer.decode( generated_ids[0][model_input.shape[-1] : -1], skip_special_tokens=True ) print(output) ```