tokenizer: add config (no accent stripping) and vocab
Browse files- tokenizer_config.json +1 -0
- vocab.txt +0 -0
tokenizer_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"do_lower_case": true, "max_len": 512, "init_inputs": [], "strip_accents":false}
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|