update
Browse files
README.md
CHANGED
@@ -2,5 +2,5 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
This is a tokenizer only, with the following modification:
|
5 |
-
- Replaced [unused0]
|
6 |
-
- Added [ES]
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
This is a tokenizer only, with the following modification:
|
5 |
+
- Replaced `[unused0]`, `[unused1]`, `[unused2]` with `[ES]`, `[DE]`, `[FR]` respectively in the vocabulary
|
6 |
+
- Added `[ES]`, `[DE]`, `[FR]` as special tokens and therefore they won't lowercased or splitted
|