--- license: apache-2.0 --- This is a tokenizer only, with the following modification: - Replaced `[unused0]`, `[unused1]`, `[unused2]` with `[ES]`, `[DE]`, `[FR]` respectively in the vocabulary - Added `[ES]`, `[DE]`, `[FR]` as special tokens and therefore they won't lowercased or splitted