Garbage tokens at the end

#2
by masterfury - opened

I am trying to use 'OpenAssistant/pythia-12b-sft-v8-7k-steps' with hugging face 'text-generation-inference', but sometimes it repeats last token or sentence, sample picture attached. Can anyone explain why this is happening and what's the fix, I have already tried playing around token length and other parameters.

image_error_pythia.png

Did you prompt the model with the OA dialogue template, e.g. <|prompter|>{query}<|endoftext|><|assistant|>? Which sampling parameters did you use?

Yes, I am using this template only. Here are the parameters I'm using -
{
"inputs": "<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>",
"parameters": {
"best_of": 1,
"details": true,
"do_sample": true,
"max_new_tokens": 1024,
"repetition_penalty": 1.03,
"return_full_text": true,
"seed": null,
"stop": [""],
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": null,
"typical_p": 0.95,
"watermark": true
}
}

I face similar issues with OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 when running in normal fp16 with hugging face text generation inference server. But when i enable quantization, then this kind of behaviour will not happen and model generates reasonable text, but very very slow (due to know problem with bits and bytes). Not sure where the issue is.

Yes, even with 'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5' same issue is there. I thought newer version will not have this issue.

OpenAssistant org

Please take a look at the sampling report .. which was generated with the huggingface transformer library for the model .. all continuations that end with <|endoftext|> are correctly finished within the max_new_token limit.
Is the tokenizers end-of-sequence token automatically respected by HF's text-generation server, e.g. is "stop": [""], correct? For 100 random prompts how often does it correctly end and how often do you see the junk?

Sure, I will have a look at sampling report.
My observations so far around this issue is -

  1. For 100 random prompts 40-50 times this issue pops up. (Tested this for 3 weeks)
  2. This happens when the api finish reason is length.
# Generation finish reason
class FinishReason(Enum):
    # number of generated tokens == `max_new_tokens`
    Length = "length"
    # the model generated its end of sequence token
    EndOfSequenceToken = "eos_token"
    # the model generated a text included in `stop_sequences`
    StopSequence = "stop_sequence"

Snippet from - https://github.com/huggingface/text-generation-inference/blob/main/clients/python/text_generation/types.py

  1. Sometimes it feels like it is generating this garbage token to fill max token length, and when max token is reached by logic it stops.

I was facing issues when using prompt as below:

<|prompter|>
{assistant instruction like you are helpful for a purpose}
{context 1: Some text}
{context 2: Some text}
{context 3: Some text}

Based on above context, answer below question:
{some question}
<|endoftext|><|assistant|>
{
                do_sample: true,
                temperature: 0.1,
                max_new_tokens: 1024,
                return_full_text: false,
                repetition_penalty: 1.1,
                num_beams: 1,
                seed: 13413423,
                top_p: 0.75,
                typical_p: 0.95,
                top_k: 45,
                stop: [
                    '<|endoftext|>'
                ],
}

Sign up or log in to comment