SaulLu
/

bengali-tokenizer

SaulLu commited on May 11, 2021

Commit

9ff1d40

•

1 Parent(s): f472ef8

tokenizer v2- include normalization discussed with Bengali community

Files changed (3) hide show

special_tokens_map.json CHANGED Viewed

@@ -8,7 +8,7 @@
     "mask_token": {
         "content": "[MASK]",
         "single_word": false,
-        "lstrip": false,
         "rstrip": false,
         "normalized": false
     }

     "mask_token": {
         "content": "[MASK]",
         "single_word": false,
+        "lstrip": true,
         "rstrip": false,
         "normalized": false
     }

tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json CHANGED Viewed

@@ -8,9 +8,9 @@
     "mask_token": {
         "content": "[MASK]",
         "single_word": false,
-        "lstrip": false,
         "rstrip": false,
-        "normalized": false,
         "__type": "AddedToken"
     },
     "model_max_length": 512,

     "mask_token": {
         "content": "[MASK]",
         "single_word": false,
+        "lstrip": true,
         "rstrip": false,
+        "normalized": true,
         "__type": "AddedToken"
     },
     "model_max_length": 512,