Just use 1 beam and get 1 output, this is much, much, much faster than 8 beams and 8 outputs and we're only using the first.
Use [input_ids.shape[-1]:] because the decoded tokenised version of plain_text may have a different number of characters to the original

Microsoft org

Thank you for the P&R. The beam size would affect the model performance. It's supposed to run the demo on GPUs.

I duplicate the demo at https://huggingface.co/spaces/unilm/Promptist-faster and merge your update.

I also added a note as follows:

Note: This is a version with beam_size=1 while the original demo uses beam_size=8. So there would be a difference in terms of performance, but this demo is much faster. Many thanks to @HughPH for pointing out this improvement.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment