bigscience/mt0-xxl · mt0-xxl web app performance issue?

Thanks for the great work on training and releazing this model! Upon reading the discussion on the Bloom page related to Bloomz and mt0-xxl, I am excited to try out mt0-xxl web version (given that bloomz web is gone). I am puzzled to see simply running the default example would generate result that seems to be not as impressive. The screenshot shows the three issues I think we can easily spot.

I also ran some of the cases from xP3 dataset for zh and noticed those cases with short answers tend to be good, while those with longer answers would show various kind of issues, including 1) words repeated immediately one after another sometimes, 2) grammatically incorrect sentences, 3) irrelevant contents, and 4) uncoherent context etc.
Is there something missing somewhere?

BTW, do you mind sharing the "best" parameters used for calling the "model.generate()" in a locally deployed mt0-xxl model please?