Garbage tokens at the end

by masterfury - opened May 10, 2023

May 10, 2023

I am trying to use 'OpenAssistant/pythia-12b-sft-v8-7k-steps' with hugging face 'text-generation-inference', but sometimes it repeats last token or sentence, sample picture attached. Can anyone explain why this is happening and what's the fix, I have already tried playing around token length and other parameters.

andreaskoepf

OpenAssistant org May 10, 2023

•

edited May 10, 2023

Did you prompt the model with the OA dialogue template, e.g. <|prompter|>{query}<|endoftext|><|assistant|>? Which sampling parameters did you use?

masterfury

May 11, 2023

Yes, I am using this template only. Here are the parameters I'm using -
{
"inputs": "<|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>",
"parameters": {
"best_of": 1,
"details": true,
"do_sample": true,
"max_new_tokens": 1024,
"repetition_penalty": 1.03,
"return_full_text": true,
"seed": null,
"stop": [""],
"temperature": 0.5,
"top_k": 10,
"top_p": 0.95,
"truncate": null,
"typical_p": 0.95,
"watermark": true
}
}

gsaivinay

May 11, 2023

I face similar issues with OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 when running in normal fp16 with hugging face text generation inference server. But when i enable quantization, then this kind of behaviour will not happen and model generates reasonable text, but very very slow (due to know problem with bits and bytes). Not sure where the issue is.

masterfury

May 11, 2023

Yes, even with 'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5' same issue is there. I thought newer version will not have this issue.

andreaskoepf

OpenAssistant org May 11, 2023

Please take a look at the sampling report .. which was generated with the huggingface transformer library for the model .. all continuations that end with <|endoftext|> are correctly finished within the max_new_token limit.
Is the tokenizers end-of-sequence token automatically respected by HF's text-generation server, e.g. is "stop": [""], correct? For 100 random prompts how often does it correctly end and how often do you see the junk?

masterfury

May 11, 2023

Sure, I will have a look at sampling report.
My observations so far around this issue is -

For 100 random prompts 40-50 times this issue pops up. (Tested this for 3 weeks)
This happens when the api finish reason is length.

# Generation finish reason
class FinishReason(Enum):
    # number of generated tokens == `max_new_tokens`
    Length = "length"
    # the model generated its end of sequence token
    EndOfSequenceToken = "eos_token"
    # the model generated a text included in `stop_sequences`
    StopSequence = "stop_sequence"

Snippet from - https://github.com/huggingface/text-generation-inference/blob/main/clients/python/text_generation/types.py

Sometimes it feels like it is generating this garbage token to fill max token length, and when max token is reached by logic it stops.

gsaivinay

May 11, 2023

•

edited May 11, 2023

I was facing issues when using prompt as below:

<|prompter|>
{assistant instruction like you are helpful for a purpose}
{context 1: Some text}
{context 2: Some text}
{context 3: Some text}

Based on above context, answer below question:
{some question}
<|endoftext|><|assistant|>

{
                do_sample: true,
                temperature: 0.1,
                max_new_tokens: 1024,
                return_full_text: false,
                repetition_penalty: 1.1,
                num_beams: 1,
                seed: 13413423,
                top_p: 0.75,
                typical_p: 0.95,
                top_k: 45,
                stop: [
                    '<|endoftext|>'
                ],
}

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment