Output seems overly truncated, and max_length don't seem to matter

#39
by joekr552 - opened

For the inference API, and the homepage preview, the model output seems overly truncated.

E.g. trying any Chain of Thought on e.g. GSM 8K, gives very weird short, (truncated), responses.

Seems like a bug to me?

I agree. I tried it using the inference api as well. And the output was sometimes only a single word.

@joekr552 @bfgdhrubo The Flan Collection is more oriented to short-answer academic-style tasks, rather than longer conversational dialog responses like ChatGPT type models. Also chain of thought style questions are still quite hard for models of this size, though they demonstrate some capabilities as we trained on a few of those tasks.

If you're looking for longer, conversational responses or CoT specifically, I'd recommend either (a) finetuning additionally on dialog-style data (e.g. like Alpaca, distilled from ChatGPT), or (b) using Flan-UL2 which should have slightly stronger CoT capabilities. Flan-T5 really excels for short-answer NLP tasks and for finetuning, where it beats out most competitors, even of slightly larger size.

Hope this was helpful!

Understood that this model isn't trained for verbose conversation style response. However, even with simple tasks the output is strangely truncated. See below for an example:

PROMPT
Insert spaces between the words in the following text: Thisishowtothrowbackafishyoudon’tlike,andthatwaysomethingbadwon’tfallonyou;thisishowtobullyaman;thisishowamanbulliesyou;thisishowtoloveaman;andifthisdoesn’tworkthereareotherways,andiftheydon’tworkdon’tfeeltoobadaboutgivingup;thisishowtospitupintheairifyoufeellikeit,andthisishowtomovequicksothatitdoesn’tfallonyou;thisishowtomakeendsmeet;alwayssqueezebreadtomakesureit’sfresh;butwhatifthebakerwon’tletmefeelthebread?;youmeantosaythatafterallyouarereallygoingtobethekindofwomanwhothebakerwon’tletnearthebread?

RESPONSE
This is how to throw back a fish you don’t like, and that way something bad

Screenshot from 2023-06-15 18-53-09.png

I have the exact same issue, with
Screen Shot 2023-06-30 at 12.06.43 PM.png

Screen Shot 2023-06-30 at 12.06.58 PM.png

I am also having same issue, response length is very short and truncated. How to solve the issue?

From what the model states, its meant to give short anwsers, beyond that though, The model utters absolute nonsense.

I've noticed that having a provided context can reduce this alot. Though it will then from statement very similar to the context

This issue seems to be fixed now!

Screen Shot 2023-11-17 at 7.18.59 AM.png

I believe I am having a similar issue when I asked a COT question.
Question: Answer the following question by step-by-step reasoning. The cafeteria had 23 apples if 20 were used and 6 new were bought. How many apples are left at the end?
Model response: ['The cafeteria used 20 apples and bought 6 apples so there are 23 - 20 =']

Hi, I understood this model is not for generating verbose answers. Is there a way to see the default parameters used for top_p, max_token etc and able to tweak it a bit. This will keep the answers short but at least complete the sentence without trimming.

check config.json

check config.json

Which parameter from the config file for this model flan-t5-xxl controls max_length of response:

config = {
"architectures": [
"T5ForConditionalGeneration"
],
"d_ff": 10240,
"d_kv": 64,
"d_model": 4096,
"decoder_start_token_id": 0,
"dropout_rate": 0.1,
"eos_token_id": 1,
"feed_forward_proj": "gated-gelu",
"initializer_factor": 1.0,
"is_encoder_decoder": true,
"layer_norm_epsilon": 1e-06,
"model_type": "t5",
"num_decoder_layers": 24,
"num_heads": 64,
"num_layers": 24,
"output_past": true,
"pad_token_id": 0,
"relative_attention_max_distance": 128,
"relative_attention_num_buckets": 32,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.24.0.dev0",
"use_cache": true,
"vocab_size": 32128
}

[Worked!] An update and hope this helps someone lost like myself to find the path. The parameter that controls length was removed from the config after flan_t5 series. It has to be set directly in the transformer.predict or sagemaker.endpoint payload. Additionally, the parameter is not similar to other LLM (max_length etc) its "max_new_tokens". The pricing model for this need to be monitored after increasing.

Sign up or log in to comment