turboderp
/

Llama-3-8B-Instruct-exl2

Model card Files Files and versions Community

[Update needed to stop infinite generation] On special_tokens_map.json & tokenizer_config.son

by apolloparty - opened Apr 20

Discussion

apolloparty

Apr 20

special_token_map .json should be like this:

{
"bos_token": "<|begin_of_text|>",
"eos_token": "<|eot_id|>"
}

tokenizer_config.json at the line 2055 should be like this:

"eos_token": "<|eot_id|>",

Skorcht

Apr 20

"(Illustration)

Please note that this is not a real conversation but rather a demonstration of how the AI assistant would respond based on the given prompts. The AI does not understand context, emotions, or morals, nor can it speak as the user. Its purpose is solely to provide factual information and assist with requests within its capabilities." how do we get rid of this garbage? lol

Skorcht

Apr 20

repetition still exists..

turboderp

Owner Apr 20

It's all up to the framework you use to correctly set a stop condition for the generation. If the framework doesn't stop asking the model for more tokens after the model has signaled that it's done talking, it will just keep going.

As it happens, for whatever reason, Llama-3 uses an end-of-turn token that's different from its EOS token. Many frameworks have a hard time dealing with that, but there are a number of different workarounds. I haven't found one universal method that satisfies all the frameworks, but you can redefine the EOS token to be number 128009 in config.json, you can add this value in generation_config.json instead (either change "eos_token" or make it a list of [128001, 128009], or you can redefine it in tokenizer_config.json. All with potential side effects, of course. Best solution is to use a generator that natively understands the Llama-3 prompt format.

Skorcht

Apr 20

It's all up to the framework you use to correctly set a stop condition for the generation. If the framework doesn't stop asking the model for more tokens after the model has signaled that it's done talking, it will just keep going.

As it happens, for whatever reason, Llama-3 uses an end-of-turn token that's different from its EOS token. Many frameworks have a hard time dealing with that, but there are a number of different workarounds. I haven't found one universal method that satisfies all the frameworks, but you can redefine the EOS token to be number 128009 in config.json, you can add this value in generation_config.json instead (either change "eos_token" or make it a list of [128001, 128009], or you can redefine it in tokenizer_config.json. All with potential side effects, of course. Best solution is to use a generator that natively understands the Llama-3 prompt format.

text gen ui doesnt.

turboderp

Owner Apr 20

In TGW you should be able to set a stop condition of "<|eot_id|>" while unchecking "skip special tokens."

Skorcht

Apr 20

where?

In TGW you should be able to set a stop condition of "<|eot_id|>" while unchecking "skip special tokens."

Skorcht

Apr 20

In TGW you should be able to set a stop condition of "<|eot_id|>" while unchecking "skip special tokens."

i set it but it randomly starts repeating again

turboderp

Owner Apr 20

Is it outputting "<|eot_id|>" in the response, or does it go straight to "assistant"?

TokyoMorose

Apr 21

•

edited Apr 21

It's all up to the framework you use to correctly set a stop condition for the generation. If the framework doesn't stop asking the model for more tokens after the model has signaled that it's done talking, it will just keep going.

As it happens, for whatever reason, Llama-3 uses an end-of-turn token that's different from its EOS token. Many frameworks have a hard time dealing with that, but there are a number of different workarounds. I haven't found one universal method that satisfies all the frameworks, but you can redefine the EOS token to be number 128009 in config.json, you can add this value in generation_config.json instead (either change "eos_token" or make it a list of [128001, 128009], or you can redefine it in tokenizer_config.json. All with potential side effects, of course. Best solution is to use a generator that natively understands the Llama-3 prompt format.

For me, setting EOS to 128009 in config.jason fixed it 100% on TGW and ST. I also made the changes in the OP here.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment