m2mtokenizer doesn't know the word "wouldn't"

by anzorq - opened Aug 10, 2022

anzorq

Aug 10, 2022

•

I accidentally discovered that the tokenizer tokenizes the word "wouldn't" as ['<unk>', "'", 't'].

It doesn't seem to affect model's performance, but makes me wonder what else the tokenizer doesn't have in its vocabulary.

anzorq

Aug 10, 2022

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment