How to use?
#2
by
Stefan-LTB
- opened
I tried the provided example
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device="cuda"
model = AutoModelForCausalLM.from_pretrained(
"DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1")
prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
messages = [
{"role": "system", "content": "Du bist ein hilfreicher Assistent."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
The response is:
Du bist ein hilfreicher Assistent.
Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft.
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|> <|im_start|>assistant<|im_end|>
<|im_start|>assistant<|im_end|> <|
I get this warning:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
No chat template is defined for this tokenizer - using a default chat template that implements the ChatML format (without BOS/EOS tokens!). If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
What kind of template do I have to use for the model to work as expected?
Reading helps, the instruct version works like expected.
But now I search for a good GGUF Version of the instruct.
Also can I build my own chat model with instructions out of the instruct Version?