m2mtokenizer doesn't know the word "wouldn't"

#2
by anzorq - opened

I accidentally discovered that the tokenizer tokenizes the word "wouldn't" as ['<unk>', "'", 't'].

It doesn't seem to affect model's performance, but makes me wonder what else the tokenizer doesn't have in its vocabulary.

This comment has been hidden
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment