peyash's picture
update
41d04e0 verified
metadata
license: apache-2.0

This is a tokenizer only, with the following modification:

  • Replaced [unused0], [unused1], [unused2] with [ES], [DE], [FR] respectively in the vocabulary
  • Added [ES], [DE], [FR] as special tokens and therefore they won't lowercased or splitted