Salesforce/BLIP2 · A clever way to stop stupid generation starting after a certain amount of words at runtime?

Hello!

I am very happy with BLIP 2 and use it for caption and QA. I love long outputs (even if the truth sometimes is near zero, but that's okay, I need the fun factor) and often it works but sometimes it starts to generate things like "[...] seen in the newspaper of June 2008, hashtag x hashtag y ........". I really would like to stop the generator as soon as this kind of nonsense starts. I know how to detect the beginning of those breakpoints when the generation process has finished and outputs the result. But I want to cut it in the moment of coming up. The reason is that i.E. if I set a high max_length of 128 and the generator starts after 12 tokens with it's nonsense, the output needs rather long time because it is happens (in the worst case) not before 128 tokens are generated and that takes time. I want to offer it in an app and it makes a difference if a use has to wait 2 seconds to the output or 10 seconds. :-)
Has someone an idea how to handle this? I also would pay for this if someone could do the coding for me.

Here is the code I am using:

from lavis.models import load_model_and_preprocess
device = "cuda" if torch.cuda.is_available() else "cpu"

model, vis_processors, txt_processors = load_model_and_preprocess(name="blip2_opt", model_type="pretrain_opt2.7b", is_eval=True, device=device )

vis_processors.keys()
txt_processors.keys()

raw_image = Image.open('UIDimgsages/1.jpg').convert('RGB')
question1 = "Question: What history has the house?\nAnswer:"

image = vis_processors"eval".unsqueeze(0).to(device)
sentences=model.generate({"image": image}, repetition_penalty=8.0, max_length=64, min_length=20)
print(sentences[0])

sentences=model.generate({"image": image, "prompt": question1}, repetition_penalty=8.0, max_length=64, min_length=20, num_beams=7, length_penalty=-1)
print(sentences[0])

The given values of max_length and so on differ in my project. Sometimes higher, sometimes lower.
I think there are no other parameters aside of nucleus (not use here) and the ones I have included in my code, am I right there? I tried temperature, k_top and so on, everything I found in the BLIP2 python scripts but they seem to be disabled.

By the way I have equal (may be a bit better) results since using pretrain_opt2.7b. Befor I used the flan-T5-XXL model. The 8 bit version of transformers matched into my 4090 but the inference time was muuuuuch longer!!

Best regards
Marc