Special tokens in output generation

#2
by Matthieu - opened

Hello,

Thanks for sharing this model!

When generating output, and even if "skip_special_tokens=True" there are two special tokens at beginning ( ) and ending (\n) of this output, in addition to special whitespace tokens.
Is there any way of removing them and use space token instead of special whitespace tokens?

Large Model Systems Organization org

Thanks a lot for trying the model! Can you try using T5Tokenizer instead of AutoTokenizer, and uses spaces_between_special_tokens=False when decoding?

Thanks for your feedback! I have applied all your recommendations but I still have at the end of output generation a newline character (\n).

Any idea?

Large Model Systems Organization org

Hi,
Can you take a screenshot of the problem(input, tokenized input, decoded etc) so that we can walk through it a bit? BTW, here is a question we got from the GitHub. It seems pretty similar: https://github.com/lm-sys/FastChat/issues/1022. Maybe you can also take a look?

Sign up or log in to comment