🚩 Report

#1
by rgallardo - opened

The model is bugged and is not responding okay. Here's an example:

Input:

"LongT5 model is an encoder-decoder transformer pre-trained in a text-to-text denoising generative setting (Pegasus-like generation pre-training). LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. The usage of attention sparsity patterns allows the model to efficiently handle input sequence.

LongT5 is particularly effective when fine-tuned for text generation (summarization, question answering) which requires handling long input sequences (up to 16,384 tokens)."

Output:

"matematic matematic orchid orchid orchid orchid orchid orchid orchid orchid orchid orchid orchid orchid orchid orchid orchid orchid orchid"

The same happens with other examples.

I am also having a similar issue. On both fine-tuned and base versions of this model, I get output on a simple summarization task like the following:

" informal informal Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt"
" the a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a"

I am also having a similar issue. On both fine-tuned and base versions of this model, I get output on a simple summarization task like the following:

" informal informal Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt Kontakt"
" the a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a"

Which precision are you running the model at? I have noticed some issues when trying to load it in 16-bit, maybe that might be an issue? Try to run it with torch_dtype=torch.float32 or just omit the dtype argument when loading the model.

Sign up or log in to comment