Update generation_config.json

by abhi-db - opened Apr 18, 2024

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

-1

abhi-db

Apr 18, 2024

In the Model Card, I see that there is a workaround by manually updating eos_token_id in any generate call or pipeline:

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

But I think there is a simpler way to fix this! If you just update the generation_config.json to stop on both <|end_of_text|> as well as <|eot_id|>, then it should work automatically and you won't need to build the terminators.

Update generation_config.json49976cbf

entropy

Apr 18, 2024

Running into the same issue. With the default config, the model doesn't stop at <|eot_id|> and will generate new text for the user.

Is there a way to prevent this?

abhi-db

Apr 18, 2024

Hmm @entropy could you provide more details your setup? Here's what is working for me, referencing this PR:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_path = "meta-llama/Meta-Llama-3-8B-Instruct"
revision = "refs/pr/4"

tokenizer = AutoTokenizer.from_pretrained(model_path, revision=revision)
model = AutoModelForCausalLM.from_pretrained(model_path, revision=revision, device_map="auto", torch_dtype=torch.bfloat16)

prompt = "Write a haiku about terminators."
chat = [{'content': prompt, 'role': 'user'}]
chat_tokens = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, return_tensors='pt').to(model.device)

new_chat_tokens = model.generate(chat_tokens, do_sample=False, max_new_tokens=128)
new_chat_str = tokenizer.decode(new_chat_tokens[0])
print (new_chat_str)

produces:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Write a haiku about terminators.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Metal hearts ablaze
Rise from ashes, cold and dark
Judgment day arrives<|eot_id|>

Wladastic

Apr 18, 2024

•

edited Apr 18, 2024

Same here, I use oobabooga textgen and llama 3 8B instruct will not shut up.
To reproduce just tell it 1 token and to say START for example.

aeminkocal

Apr 18, 2024

It's the same with TabbyAPI.

venzon

Apr 19, 2024

In oobabooga text-generation-webui, you also need to uncheck "Skip special tokens" in the Parameters -> Generation tab.

0-hero

Apr 19, 2024

fixed gguf quant here. https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF

lemonflourorange

Apr 19, 2024

•

edited Apr 19, 2024

No description provided.

aeminkocal

Apr 19, 2024

https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/14

check here my latest message.

viperx7

Apr 19, 2024

for me this change was not enough on text generation webui
i had to uncheck "skip special tokens" and add "<|eot_id|>" in custom stop strings after that every thing was good

pcuenq

Meta Llama org Apr 19, 2024

Thank you @abhi-db !

pcuenq changed pull request status to merged Apr 19, 2024

robert1968

Apr 19, 2024

fixed gguf quant here. https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF

yes and it works fine. i use Meta-Llama-3-8B-Instruct.Q8_0.gguf and Meta-Llama-3-8B-Instruct.Q6_K.gguf and both perfectly stop conversation when finished.
Many thanks. :)

Gerald001

Apr 21, 2024

•

edited Apr 21, 2024

hi guys, is my issue related to the same problem described here? https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/36
if yes, will this repo be fixed?

Jiar

Apr 26, 2024

•

edited Apr 26, 2024

Hmm @entropy could you provide more details your setup? Here's what is working for me, referencing this PR:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_path = "meta-llama/Meta-Llama-3-8B-Instruct"
revision = "refs/pr/4"

tokenizer = AutoTokenizer.from_pretrained(model_path, revision=revision)
model = AutoModelForCausalLM.from_pretrained(model_path, revision=revision, device_map="auto", torch_dtype=torch.bfloat16)

prompt = "Write a haiku about terminators."
chat = [{'content': prompt, 'role': 'user'}]
chat_tokens = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, return_tensors='pt').to(model.device)

new_chat_tokens = model.generate(chat_tokens, do_sample=False, max_new_tokens=128)
new_chat_str = tokenizer.decode(new_chat_tokens[0])
print (new_chat_str)

produces:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Write a haiku about terminators.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Metal hearts ablaze
Rise from ashes, cold and dark
Judgment day arrives<|eot_id|>

please change new_chat_str = tokenizer.decode(new_chat_tokens[0]) to new_chat_str = tokenizer.decode(new_chat_tokens[0], skip_special_tokens=True)

Zibri

May 18, 2024

•

edited May 18, 2024

I could edit the interface to hide it... but it's also present in the non-instruct version.
I downloaded it from here and quantized. I did nothing else (for now)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment