It doesn't stop generating text.

#3
by MohamedRashad - opened

I have faced this problem with llama-2-7B-32k where it continues producing text until the max number of tokens is reached.
Is there a solution for this problem ?

Set the EOS token to the corresponding value in the vocabulary

@macadeliccc
Can you give me a code example ?

https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizer.eos_token

This is the example from amazon/MistralLite:

from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
import torch

model_id = "amazon/MistralLite"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,
                                             torch_dtype=torch.bfloat16,
                                             use_flash_attention_2=True,
                                             device_map="auto",)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)
prompt = "<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>"

sequences = pipeline(
    prompt,
    max_new_tokens=400,
    do_sample=False,
    return_full_text=False,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"{seq['generated_text']}")

This line can be referenced like this:

 eos_token_id=tokenizer.eos_token_id,

Or like this:

 eos_token_id=32101,

The number i selected is arbitrary I just wanted to show you that its referencing the index of the vocabulary.

Given that this is a mistral ft I think this should suffice. Regardless, this is the logic that stops the sentence and prevents run-on generation and can be found in most/all text generation models.

@macadeliccc It worked thanks

MohamedRashad changed discussion status to closed

@macadeliccc Can you tell me what ideal value to keep for eos_token_id. Also, I read somewhere that adding this to the code also helps: model.config.pad_token_id=tokenizer.eos_token_id, helps in solving this issue. But I'm confused that do I have to specify any value for eos_token_id? If so, then what's the ideal value to set up? Please let me know. Thanks!

@Rmote6603

EOS token should always be set like this

 eos_token_id=tokenizer.eos_token_id,

This way it will just use whatever the eos token is in the tokenizer youre using with the model

@macadeliccc Here's my code snippet:
base_model = "mistralai/Mistral-7B-Instruct-v0.2"

eos_token="[/INST]"

bnb_config = BitsAndBytesConfig(
load_in_4bit= True,
bnb_4bit_quant_type= "nf4",
bnb_4bit_compute_dtype= torch.bfloat16,
bnb_4bit_use_double_quant= True,
llm_int8_enable_fp32_cpu_offload=True
)
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=bnb_config
# device_map="auto",

)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()

Load tokenizer

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

tokenizer.padding_side = 'right'
tokenizer.add_eos_token = True
tokenizer.max_new_tokens=2000
tokenizer.max_length=200
tokenizer.max_new_length=200
tokenizer.pad_token_id=2041
tokenizer.pad_token = tokenizer.unk_token
eos_token_id=tokenizer.eos_token_id
model.config.pad_token_id = tokenizer.pad_token_id

I'm trying to solve two issues here:
1- Massive repetition, self talk, etc
2- Abruptly stopping the generation

Let me know your comments. Thanks!

I had similar problem problem and solve it quite rude:

def dataset_to_dialog_fromat(text, annotations):
dialogs = []
for i in range(len(text)):
chat = [
{"role": "user", "content": text[i]},
{"role": "assistant", "content": annotations[i] + " ##################"},
]
dialogs.append(chat)
return dialogs

I add " ##################" and use cut inference based on this symbol "#" in the inference and use add text before this symbol.

Sign up or log in to comment