google/flan-t5-xl · Can flan-t5-xl provide a more complete answer ?

I am trying to repeat some results from the paper . In the supplementary Figure 1 , the author provide some useful prompt for tuning Med-PalM . For example, here I my code :

input_text="You are a helpful medical knowledge assistant. Provide useful , complete and scientifically-grounded answers to common questions about health. Question: How do you treat skin redness ?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

The output only gives a very simple answer below :

You can treat skin redness with a topical ointment,

Seems like the answer is not complete and too simple. Does anyone know how to fix the problem by providing more comprehensive answer ?
PS: I am setting it up in CPU environment. Does CPU and GPU make a difference in terms of the answer ?

Thanks,

Hi @xanthexu
Thanks for the issue
The prompt you are providing is quite specific to chat based models or instruction-fine tuned models. I don't think Flan-T5 is adapted for those type of tasks unfortunately. In my experience flan-t5 models works well for small answers and if you want to generate longer answers you might want to try sampling methods. https://huggingface.co/blog/how-to-generate

For prompting general questions to a Language Model, I advise you to look into popular causal language models such as LLama-2, or different variants of Mistral-7b such as: Mistral-7-v1.0 / Mistral-7B Orca
I highly recommend the last models as you should be able to run them on a free-tier google colab instance, a 7B model contains ~16GB of weights in float16 precision but you can easily reduce that requirement to 5GB thanks to 4bit quantization. Read more about it here: https://huggingface.co/blog/4bit-transformers-bitsandbytes