[Update needed to stop infinite generation] On special_tokens_map.json & tokenizer_config.son

#1
by apolloparty - opened

special_token_map .json should be like this:

{
"bos_token": "<|begin_of_text|>",
"eos_token": "<|eot_id|>"
}

tokenizer_config.json at the line 2055 should be like this:

"eos_token": "<|eot_id|>",

"(Illustration)

Please note that this is not a real conversation but rather a demonstration of how the AI assistant would respond based on the given prompts. The AI does not understand context, emotions, or morals, nor can it speak as the user. Its purpose is solely to provide factual information and assist with requests within its capabilities." how do we get rid of this garbage? lol

repetition still exists..

It's all up to the framework you use to correctly set a stop condition for the generation. If the framework doesn't stop asking the model for more tokens after the model has signaled that it's done talking, it will just keep going.

As it happens, for whatever reason, Llama-3 uses an end-of-turn token that's different from its EOS token. Many frameworks have a hard time dealing with that, but there are a number of different workarounds. I haven't found one universal method that satisfies all the frameworks, but you can redefine the EOS token to be number 128009 in config.json, you can add this value in generation_config.json instead (either change "eos_token" or make it a list of [128001, 128009], or you can redefine it in tokenizer_config.json. All with potential side effects, of course. Best solution is to use a generator that natively understands the Llama-3 prompt format.

It's all up to the framework you use to correctly set a stop condition for the generation. If the framework doesn't stop asking the model for more tokens after the model has signaled that it's done talking, it will just keep going.

As it happens, for whatever reason, Llama-3 uses an end-of-turn token that's different from its EOS token. Many frameworks have a hard time dealing with that, but there are a number of different workarounds. I haven't found one universal method that satisfies all the frameworks, but you can redefine the EOS token to be number 128009 in config.json, you can add this value in generation_config.json instead (either change "eos_token" or make it a list of [128001, 128009], or you can redefine it in tokenizer_config.json. All with potential side effects, of course. Best solution is to use a generator that natively understands the Llama-3 prompt format.

text gen ui doesnt.

In TGW you should be able to set a stop condition of "<|eot_id|>" while unchecking "skip special tokens."

where?

In TGW you should be able to set a stop condition of "<|eot_id|>" while unchecking "skip special tokens."

In TGW you should be able to set a stop condition of "<|eot_id|>" while unchecking "skip special tokens."

i set it but it randomly starts repeating again

Is it outputting "<|eot_id|>" in the response, or does it go straight to "assistant"?

It's all up to the framework you use to correctly set a stop condition for the generation. If the framework doesn't stop asking the model for more tokens after the model has signaled that it's done talking, it will just keep going.

As it happens, for whatever reason, Llama-3 uses an end-of-turn token that's different from its EOS token. Many frameworks have a hard time dealing with that, but there are a number of different workarounds. I haven't found one universal method that satisfies all the frameworks, but you can redefine the EOS token to be number 128009 in config.json, you can add this value in generation_config.json instead (either change "eos_token" or make it a list of [128001, 128009], or you can redefine it in tokenizer_config.json. All with potential side effects, of course. Best solution is to use a generator that natively understands the Llama-3 prompt format.

For me, setting EOS to 128009 in config.jason fixed it 100% on TGW and ST. I also made the changes in the OP here.

Sign up or log in to comment