Output seems overly truncated, and max_length don't seem to matter

#39

by joekr552 - opened Mar 16, 2023

Discussion

joekr552

Mar 16, 2023

For the inference API, and the homepage preview, the model output seems overly truncated.

E.g. trying any Chain of Thought on e.g. GSM 8K, gives very weird short, (truncated), responses.

Seems like a bug to me?

bfgdhrubo

Apr 3, 2023

I agree. I tried it using the inference api as well. And the output was sometimes only a single word.

Shayne

Apr 5, 2023

@joekr552 @bfgdhrubo The Flan Collection is more oriented to short-answer academic-style tasks, rather than longer conversational dialog responses like ChatGPT type models. Also chain of thought style questions are still quite hard for models of this size, though they demonstrate some capabilities as we trained on a few of those tasks.

If you're looking for longer, conversational responses or CoT specifically, I'd recommend either (a) finetuning additionally on dialog-style data (e.g. like Alpaca, distilled from ChatGPT), or (b) using Flan-UL2 which should have slightly stronger CoT capabilities. Flan-T5 really excels for short-answer NLP tasks and for finetuning, where it beats out most competitors, even of slightly larger size.

Hope this was helpful!

tristanchambers-bids

Jun 16, 2023

Understood that this model isn't trained for verbose conversation style response. However, even with simple tasks the output is strangely truncated. See below for an example:

PROMPT
Insert spaces between the words in the following text: Thisishowtothrowbackafishyoudon’tlike,andthatwaysomethingbadwon’tfallonyou;thisishowtobullyaman;thisishowamanbulliesyou;thisishowtoloveaman;andifthisdoesn’tworkthereareotherways,andiftheydon’tworkdon’tfeeltoobadaboutgivingup;thisishowtospitupintheairifyoufeellikeit,andthisishowtomovequicksothatitdoesn’tfallonyou;thisishowtomakeendsmeet;alwayssqueezebreadtomakesureit’sfresh;butwhatifthebakerwon’tletmefeelthebread?;youmeantosaythatafterallyouarereallygoingtobethekindofwomanwhothebakerwon’tletnearthebread?

RESPONSE
This is how to throw back a fish you don’t like, and that way something bad

VatsaDev

Jun 30, 2023

I have the exact same issue, with

jyotirman

Jul 15, 2023

I am also having same issue, response length is very short and truncated. How to solve the issue?

VatsaDev

Jul 16, 2023

From what the model states, its meant to give short anwsers, beyond that though, The model utters absolute nonsense.

I've noticed that having a provided context can reduce this alot. Though it will then from statement very similar to the context

tristanchambers-bids

Nov 17, 2023

This issue seems to be fixed now!

flysaurus

Jan 9

I believe I am having a similar issue when I asked a COT question.
Question: Answer the following question by step-by-step reasoning. The cafeteria had 23 apples if 20 were used and 6 new were bought. How many apples are left at the end?
Model response: ['The cafeteria used 20 apples and bought 6 apples so there are 23 - 20 =']

hugllm

Feb 7

Hi, I understood this model is not for generating verbose answers. Is there a way to see the default parameters used for top_p, max_token etc and able to tweak it a bit. This will keep the answers short but at least complete the sentence without trimming.

flysaurus

Feb 7

check config.json

hugllm

Feb 8

check config.json

Which parameter from the config file for this model flan-t5-xxl controls max_length of response:

config = {
"architectures": [
"T5ForConditionalGeneration"
],
"d_ff": 10240,
"d_kv": 64,
"d_model": 4096,
"decoder_start_token_id": 0,
"dropout_rate": 0.1,
"eos_token_id": 1,
"feed_forward_proj": "gated-gelu",
"initializer_factor": 1.0,
"is_encoder_decoder": true,
"layer_norm_epsilon": 1e-06,
"model_type": "t5",
"num_decoder_layers": 24,
"num_heads": 64,
"num_layers": 24,
"output_past": true,
"pad_token_id": 0,
"relative_attention_max_distance": 128,
"relative_attention_num_buckets": 32,
"tie_word_embeddings": false,
"torch_dtype": "float32",
"transformers_version": "4.24.0.dev0",
"use_cache": true,
"vocab_size": 32128
}

hugllm

Feb 8

[Worked!] An update and hope this helps someone lost like myself to find the path. The parameter that controls length was removed from the config after flan_t5 series. It has to be set directly in the transformer.predict or sagemaker.endpoint payload. Additionally, the parameter is not similar to other LLM (max_length etc) its "max_new_tokens". The pricing model for this need to be monitored after increasing.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment