Upload tokenizer

Browse files

Files changed (4) hide show

README.md +199 -0
special_tokens_map.json +7 -0
tokenizer.json +991 -0
tokenizer_config.json +52 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,991 @@

+{
+  "version": "1.0",
+  "truncation": null,
+  "padding": null,
+  "added_tokens": [
+    {
+      "id": 0,
+      "content": "<s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 1,
+      "content": "</s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 2,
+      "content": "<unk>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 3,
+      "content": "<pad>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 4,
+      "content": "<mask>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    }
+  ],
+  "normalizer": null,
+  "pre_tokenizer": {
+    "type": "ByteLevel",
+    "add_prefix_space": true,
+    "trim_offsets": true,
+    "use_regex": true
+  },
+  "post_processor": {
+    "type": "TemplateProcessing",
+    "single": [
+      {
+        "Sequence": {
+          "id": "A",
+          "type_id": 0
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 0
+        }
+      }
+    ],
+    "pair": [
+      {
+        "Sequence": {
+          "id": "A",
+          "type_id": 0
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 0
+        }
+      },
+      {
+        "Sequence": {
+          "id": "B",
+          "type_id": 1
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 1
+        }
+      }
+    ],
+    "special_tokens": {
+      "</s>": {
+        "id": "</s>",
+        "ids": [
+          1
+        ],
+        "tokens": [
+          "</s>"
+        ]
+      },
+      "<s>": {
+        "id": "<s>",
+        "ids": [
+          0
+        ],
+        "tokens": [
+          "<s>"
+        ]
+      }
+    }
+  },
+  "decoder": {
+    "type": "ByteLevel",
+    "add_prefix_space": true,
+    "trim_offsets": true,
+    "use_regex": true
+  },
+  "model": {
+    "type": "BPE",
+    "dropout": null,
+    "unk_token": "<unk>",
+    "continuing_subword_prefix": null,
+    "end_of_word_suffix": null,
+    "fuse_unk": false,
+    "byte_fallback": false,
+    "vocab": {
+      "<s>": 0,
+      "</s>": 1,
+      "<unk>": 2,
+      "<pad>": 3,
+      "<mask>": 4,
+      "!": 5,
+      "\"": 6,
+      "%": 7,
+      "&": 8,
+      "'": 9,
+      "(": 10,
+      ")": 11,
+      "*": 12,
+      "+": 13,
+      ",": 14,
+      "-": 15,
+      ".": 16,
+      "/": 17,
+      "0": 18,
+      "1": 19,
+      "2": 20,
+      "3": 21,
+      "4": 22,
+      "5": 23,
+      "6": 24,
+      "7": 25,
+      "8": 26,
+      "9": 27,
+      ":": 28,
+      ";": 29,
+      "<": 30,
+      "=": 31,
+      ">": 32,
+      "?": 33,
+      "A": 34,
+      "B": 35,
+      "C": 36,
+      "D": 37,
+      "E": 38,
+      "F": 39,
+      "G": 40,
+      "H": 41,
+      "I": 42,
+      "J": 43,
+      "K": 44,
+      "L": 45,
+      "M": 46,
+      "N": 47,
+      "O": 48,
+      "P": 49,
+      "Q": 50,
+      "R": 51,
+      "S": 52,
+      "T": 53,
+      "U": 54,
+      "V": 55,
+      "W": 56,
+      "X": 57,
+      "Y": 58,
+      "Z": 59,
+      "[": 60,
+      "]": 61,
+      "_": 62,
+      "`": 63,
+      "a": 64,
+      "b": 65,
+      "c": 66,
+      "d": 67,
+      "e": 68,
+      "f": 69,
+      "g": 70,
+      "h": 71,
+      "i": 72,
+      "j": 73,
+      "k": 74,
+      "l": 75,
+      "m": 76,
+      "n": 77,
+      "o": 78,
+      "p": 79,
+      "q": 80,
+      "r": 81,
+      "s": 82,
+      "t": 83,
+      "u": 84,
+      "v": 85,
+      "w": 86,
+      "x": 87,
+      "y": 88,
+      "z": 89,
+      "{": 90,
+      "}": 91,
+      "~": 92,
+      "¡": 93,
+      "¢": 94,
+      "£": 95,
+      "¤": 96,
+      "¥": 97,
+      "¦": 98,
+      "§": 99,
+      "¨": 100,
+      "©": 101,
+      "ª": 102,
+      "«": 103,
+      "®": 104,
+      "¯": 105,
+      "°": 106,
+      "±": 107,
+      "³": 108,
+      "´": 109,
+      "¶": 110,
+      "º": 111,
+      "¼": 112,
+      "½": 113,
+      "¾": 114,
+      "Â": 115,
+      "Ã": 116,
+      "Ä": 117,
+      "Å": 118,
+      "È": 119,
+      "Ë": 120,
+      "â": 121,
+      "Ġ": 122,
+      "Ģ": 123,
+      "ģ": 124,
+      "Ĥ": 125,
+      "ĥ": 126,
+      "Ħ": 127,
+      "Ĩ": 128,
+      "ĩ": 129,
+      "Ī": 130,
+      "ī": 131,
+      "Į": 132,
+      "į": 133,
+      "İ": 134,
+      "ı": 135,
+      "ĳ": 136,
+      "ĵ": 137,
+      "Ķ": 138,
+      "ĸ": 139,
+      "ĺ": 140,
+      "Ļ": 141,
+      "ļ": 142,
+      "Ľ": 143,
+      "ľ": 144,
+      "Ŀ": 145,
+      "ŀ": 146,
+      "Ł": 147,
+      "ł": 148,
+      "Ń": 149,
+      "ÃŃ": 150,
+      "Ã¡": 151,
+      "Ġp": 152,
+      "ÄĽ": 153,
+      "Ġs": 154,
+      "Ġn": 155,
+      "Ġt": 156,
+      "Ġv": 157,
+      "Ġ,": 158,
+      "ÅĻ": 159,
+      "Å¾": 160,
+      "Ġj": 161,
+      "Ã©": 162,
+      "nÃŃ": 163,
+      "Ġ.": 164,
+      "st": 165,
+      "Ġz": 166,
+      "ro": 167,
+      "Ġa": 168,
+      "ch": 169,
+      "Ġd": 170,
+      "ov": 171,
+      "Ã½": 172,
+      "Äį": 173,
+      "Ġk": 174,
+      "Ġm": 175,
+      "Å¡": 176,
+      "le": 177,
+      "Ġpo": 178,
+      "la": 179,
+      "Ġo": 180,
+      "en": 181,
+      "ak": 182,
+      "te": 183,
+      "li": 184,
+      "ou": 185,
+      "ra": 186,
+      "Ġb": 187,
+      "ÅĻe": 188,
+      "Ġne": 189,
+      "Ġna": 190,
+      "Ġje": 191,
+      "Å¯": 192,
+      "ho": 193,
+      "Å¾e": 194,
+      "Ġse": 195,
+      "Ġto": 196,
+      "nÄĽ": 197,
+      "in": 198,
+      "em": 199,
+      "Ġpro": 200,
+      "ce": 201,
+      "to": 202,
+      "Ġdo": 203,
+      "an": 204,
+      "sk": 205,
+      "at": 206,
+      "rÃ¡": 207,
+      "it": 208,
+      "Ġu": 209,
+      "no": 210,
+      "lo": 211,
+      "ÅĻÃŃ": 212,
+      "Ġza": 213,
+      "Ġby": 214,
+      "ku": 215,
+      "dy": 216,
+      "ĠÅ¾e": 217,
+      "Ġtak": 218,
+      "de": 219,
+      "ÃŃm": 220,
+      "po": 221,
+      "ci": 222,
+      "ni": 223,
+      "ĠP": 224,
+      "ko": 225,
+      "Ġst": 226,
+      "je": 227,
+      "al": 228,
+      "Ġjs": 229,
+      "Ġkte": 230,
+      "ÅĻi": 231,
+      "ky": 232,
+      "re": 233,
+      "am": 234,
+      "ĠpÅĻe": 235,
+      "jÃŃ": 236,
+      "ru": 237,
+      "Ġvy": 238,
+      "ĠÄį": 239,
+      "me": 240,
+      "Ġko": 241,
+      "ti": 242,
+      "nÃ©": 243,
+      "ka": 244,
+      "ĠnÃ¡": 245,
+      "vo": 246,
+      "Ã½ch": 247,
+      "lÃ¡": 248,
+      "Ġve": 249,
+      "cÃŃ": 250,
+      "va": 251,
+      "Ġro": 252,
+      "by": 253,
+      "na": 254,
+      "ne": 255,
+      "ĠzÃ¡": 256,
+      "da": 257,
+      "nÃ¡": 258,
+      "Ġjak": 259,
+      "ĠJ": 260,
+      "Ġh": 261,
+      "mÄĽ": 262,
+      "Å¡e": 263,
+      "bo": 264,
+      "ĠT": 265,
+      "Å¡ÃŃ": 266,
+      "ĠV": 267,
+      "du": 268,
+      "ji": 269,
+      "ĠN": 270,
+      "vÄĽ": 271,
+      "ĠS": 272,
+      "ĠA": 273,
+      "Ġpod": 274,
+      "ĠM": 275,
+      "lu": 276,
+      "ĠÃ": 277,
+      "Ġob": 278,
+      "tÄĽ": 279,
+      "Ġkter": 280,
+      "ck": 281,
+      "dÄĽ": 282,
+      "Ġsi": 283,
+      "mi": 284,
+      "se": 285,
+      "vÃ¡": 286,
+      "mu": 287,
+      "ovÃ¡": 288,
+      "Ġi": 289,
+      "ĠÃº": 290,
+      "nost": 291,
+      "ĠnÄĽ": 292,
+      "Ġod": 293,
+      "ze": 294,
+      "Ġmo": 295,
+      "Ġf": 296,
+      "ar": 297,
+      "ve": 298,
+      "ĠD": 299,
+      "ĠpÅĻi": 300,
+      "or": 301,
+      "ĠpÅĻed": 302,
+      "ĠvÃ½": 303,
+      "skÃ©": 304,
+      "Ġte": 305,
+      "ovÃ©": 306,
+      "tu": 307,
+      "Ġbu": 308,
+      "di": 309,
+      "do": 310,
+      "ĠK": 311,
+      "ĠmÄĽ": 312,
+      "Ġc": 313,
+      "Ġch": 314,
+      "er": 315,
+      "rav": 316,
+      "ĠpÅĻÃŃ": 317,
+      "ist": 318,
+      "Ġale": 319,
+      "Ġre": 320,
+      "ly": 321,
+      "ova": 322,
+      "dnÃŃ": 323,
+      "sta": 324,
+      "vr": 325,
+      "Ã¡l": 326,
+      "Ġjed": 327,
+      "Ġta": 328,
+      "Ġroz": 329,
+      "kÃ¡": 330,
+      "nu": 331,
+      "Ġ?": 332,
+      "ny": 333,
+      "ovat": 334,
+      "ÅĪ": 335,
+      "ta": 336,
+      "pa": 337,
+      "vÃŃ": 338,
+      "nÃ½": 339,
+      "dÃ¡": 340,
+      "lÃ©": 341,
+      "ĠvÃ¡": 342,
+      "un": 343,
+      "vnÃŃ": 344,
+      "Å¾i": 345,
+      "vi": 346,
+      "ob": 347,
+      "Äįe": 348,
+      "ĠB": 349,
+      "Ġaby": 350,
+      "zi": 351,
+      "Ġjsem": 352,
+      "nou": 353,
+      "uje": 354,
+      "ĠmÃ¡": 355,
+      "ĠkterÃ©": 356,
+      "ri": 357,
+      "as": 358,
+      "ĠvÄĽ": 359,
+      "ĠPo": 360,
+      "Ġho": 361,
+      "cho": 362,
+      "Ġe": 363,
+      "Ã¡t": 364,
+      "mo": 365,
+      "Ġco": 366,
+      "Ġjako": 367,
+      "Ġle": 368,
+      "ĠZ": 369,
+      "tÃŃ": 370,
+      "ÄĮ": 371,
+      "lÃŃ": 372,
+      "Ġin": 373,
+      "vrh": 374,
+      "pe": 375,
+      "sti": 376,
+      "Ġkdy": 377,
+      "ÄįnÃŃ": 378,
+      "sla": 379,
+      "ĠÄĮ": 380,
+      "ĠÅ¡": 381,
+      "unk": 382,
+      "Ġspo": 383,
+      "dÃŃ": 384,
+      "Ġpan": 385,
+      "Ġjsme": 386,
+      "Ġnej": 387,
+      "Ġnebo": 388,
+      "Å¡tÄĽ": 389,
+      "Ã½m": 390,
+      "ry": 391,
+      "ĠprÃ¡": 392,
+      "ma": 393,
+      "Ġmu": 394,
+      "sÃŃ": 395,
+      "kla": 396,
+      "Ġsou": 397,
+      "ĠdÄĽ": 398,
+      "Ġjsou": 399,
+      "Ġten": 400,
+      "ty": 401,
+      "bu": 402,
+      "Ġsv": 403,
+      "Ġtam": 404,
+      "Ġpr": 405,
+      "ba": 406,
+      "ckÃ©": 407,
+      "prav": 408,
+      "ĠuÅ¾": 409,
+      "si": 410,
+      "Å¾en": 411,
+      "ĠO": 412,
+      "Ġproto": 413,
+      "tel": 414,
+      "ĠmÃŃ": 415,
+      "Ġkon": 416,
+      "ez": 417,
+      "mÃŃ": 418,
+      "lov": 419,
+      "ĠR": 420,
+      "Ġvo": 421,
+      "Ġ<": 422,
+      "Ġli": 423,
+      "ĠÅĻe": 424,
+      "nÃŃm": 425,
+      "ĠE": 426,
+      "ent": 427,
+      "stu": 428,
+      "ĠtÄĽ": 429,
+      "ĠC": 430,
+      "ĠnÃ¡vrh": 431,
+      "lou": 432,
+      "vy": 433,
+      "ste": 434,
+      "Ġbude": 435,
+      "Ġni": 436,
+      "pu": 437,
+      "Ġmi": 438,
+      "ĠjÃ¡": 439,
+      "ĠtÃ©": 440,
+      "Ġmin": 441,
+      "Ġty": 442,
+      "ovÃ¡nÃŃ": 443,
+      "Ġzp": 444,
+      "my": 445,
+      "ÄįÃŃ": 446,
+      "ĠÅĻÃŃ": 447,
+      "dou": 448,
+      "ĠH": 449,
+      "sto": 450,
+      "ĠzÃ¡ko": 451,
+      "Ġtoho": 452,
+      "ĠkterÃ½": 453,
+      "jÃŃcÃŃ": 454,
+      "Ġpoz": 455,
+      "Ġtady": 456,
+      "sle": 457,
+      "Ã¡d": 458,
+      "Ġce": 459,
+      "eme": 460,
+      "hle": 461,
+      "nosti": 462,
+      "Å¯Å¾e": 463,
+      "Å¾ÃŃ": 464,
+      "pi": 465,
+      "ÅĻej": 466,
+      "Å¡i": 467,
+      "Ġmy": 468,
+      "za": 469,
+      "eno": 470,
+      "sa": 471,
+      "Ã©ho": 472,
+      "Ġdob": 473,
+      "Ġde": 474,
+      "Ġvel": 475,
+      "Ġtu": 476,
+      "Ġprost": 477,
+      "Äįi": 478,
+      "ĠL": 479,
+      "ĠJe": 480,
+      "Ã¡lnÃŃ": 481,
+      "Ġvz": 482,
+      "ĠtakÃ©": 483,
+      "Ġma": 484,
+      "hod": 485,
+      "rov": 486,
+      "Ġze": 487,
+      "ĠÄĮe": 488,
+      "sob": 489,
+      "Ġtom": 490,
+      "ĠdÅ¯": 491,
+      "kÅ¯": 492,
+      "Äı": 493,
+      "ter": 494,
+      "Ġsp": 495,
+      "tÃ¡": 496,
+      "ĠprotoÅ¾e": 497,
+      "Ġpa": 498,
+      "Ġkole": 499
+    },
+    "merges": [
+      "Ã Ń",
+      "Ã ¡",
+      "Ġ p",
+      "Ä Ľ",
+      "Ġ s",
+      "Ġ n",
+      "Ġ t",
+      "Ġ v",
+      "Ġ ,",
+      "Å Ļ",
+      "Å ¾",
+      "Ġ j",
+      "Ã ©",
+      "n ÃŃ",
+      "Ġ .",
+      "s t",
+      "Ġ z",
+      "r o",
+      "Ġ a",
+      "c h",
+      "Ġ d",
+      "o v",
+      "Ã ½",
+      "Ä į",
+      "Ġ k",
+      "Ġ m",
+      "Å ¡",
+      "l e",
+      "Ġp o",
+      "l a",
+      "Ġ o",
+      "e n",
+      "a k",
+      "t e",
+      "l i",
+      "o u",
+      "r a",
+      "Ġ b",
+      "ÅĻ e",
+      "Ġn e",
+      "Ġn a",
+      "Ġj e",
+      "Å ¯",
+      "h o",
+      "Å¾ e",
+      "Ġs e",
+      "Ġt o",
+      "n ÄĽ",
+      "i n",
+      "e m",
+      "Ġp ro",
+      "c e",
+      "t o",
+      "Ġd o",
+      "a n",
+      "s k",
+      "a t",
+      "r Ã¡",
+      "i t",
+      "Ġ u",
+      "n o",
+      "l o",
+      "ÅĻ ÃŃ",
+      "Ġz a",
+      "Ġb y",
+      "k u",
+      "d y",
+      "Ġ Å¾e",
+      "Ġt ak",
+      "d e",
+      "ÃŃ m",
+      "p o",
+      "c i",
+      "n i",
+      "Ġ P",
+      "k o",
+      "Ġs t",
+      "j e",
+      "a l",
+      "Ġj s",
+      "Ġk te",
+      "ÅĻ i",
+      "k y",
+      "r e",
+      "a m",
+      "Ġp ÅĻe",
+      "j ÃŃ",
+      "r u",
+      "Ġv y",
+      "Ġ Äį",
+      "m e",
+      "Ġk o",
+      "t i",
+      "n Ã©",
+      "k a",
+      "Ġn Ã¡",
+      "v o",
+      "Ã½ ch",
+      "l Ã¡",
+      "Ġv e",
+      "c ÃŃ",
+      "v a",
+      "Ġ ro",
+      "b y",
+      "n a",
+      "n e",
+      "Ġz Ã¡",
+      "d a",
+      "n Ã¡",
+      "Ġj ak",
+      "Ġ J",
+      "Ġ h",
+      "m ÄĽ",
+      "Å¡ e",
+      "b o",
+      "Ġ T",
+      "Å¡ ÃŃ",
+      "Ġ V",
+      "d u",
+      "j i",
+      "Ġ N",
+      "v ÄĽ",
+      "Ġ S",
+      "Ġ A",
+      "Ġpo d",
+      "Ġ M",
+      "l u",
+      "Ġ Ã",
+      "Ġo b",
+      "t ÄĽ",
+      "Ġkte r",
+      "c k",
+      "d ÄĽ",
+      "Ġs i",
+      "m i",
+      "s e",
+      "v Ã¡",
+      "m u",
+      "ov Ã¡",
+      "Ġ i",
+      "ĠÃ º",
+      "no st",
+      "Ġn ÄĽ",
+      "Ġo d",
+      "z e",
+      "Ġm o",
+      "Ġ f",
+      "a r",
+      "v e",
+      "Ġ D",
+      "Ġp ÅĻi",
+      "o r",
+      "ĠpÅĻe d",
+      "Ġv Ã½",
+      "sk Ã©",
+      "Ġt e",
+      "ov Ã©",
+      "t u",
+      "Ġb u",
+      "d i",
+      "d o",
+      "Ġ K",
+      "Ġm ÄĽ",
+      "Ġ c",
+      "Ġ ch",
+      "e r",
+      "ra v",
+      "Ġp ÅĻÃŃ",
+      "i st",
+      "Ġa le",
+      "Ġ re",
+      "l y",
+      "ov a",
+      "d nÃŃ",
+      "st a",
+      "v r",
+      "Ã¡ l",
+      "Ġje d",
+      "Ġt a",
+      "Ġro z",
+      "k Ã¡",
+      "n u",
+      "Ġ ?",
+      "n y",
+      "ov at",
+      "Å Ī",
+      "t a",
+      "p a",
+      "v ÃŃ",
+      "n Ã½",
+      "d Ã¡",
+      "l Ã©",
+      "Ġv Ã¡",
+      "u n",
+      "v nÃŃ",
+      "Å¾ i",
+      "v i",
+      "o b",
+      "Äį e",
+      "Ġ B",
+      "Ġa by",
+      "z i",
+      "Ġjs em",
+      "n ou",
+      "u je",
+      "Ġm Ã¡",
+      "Ġkter Ã©",
+      "r i",
+      "a s",
+      "Ġv ÄĽ",
+      "ĠP o",
+      "Ġ ho",
+      "ch o",
+      "Ġ e",
+      "Ã¡ t",
+      "m o",
+      "Ġc o",
+      "Ġjak o",
+      "Ġ le",
+      "Ġ Z",
+      "t ÃŃ",
+      "Ä Į",
+      "l ÃŃ",
+      "Ġ in",
+      "vr h",
+      "p e",
+      "st i",
+      "Ġk dy",
+      "Äį nÃŃ",
+      "s la",
+      "Ġ ÄĮ",
+      "Ġ Å¡",
+      "un k",
+      "Ġs po",
+      "d ÃŃ",
+      "Ġp an",
+      "Ġjs me",
+      "Ġne j",
+      "Ġne bo",
+      "Å¡ tÄĽ",
+      "Ã½ m",
+      "r y",
+      "Ġp rÃ¡",
+      "m a",
+      "Ġm u",
+      "s ÃŃ",
+      "k la",
+      "Ġs ou",
+      "Ġd ÄĽ",
+      "Ġjs ou",
+      "Ġt en",
+      "t y",
+      "b u",
+      "Ġs v",
+      "Ġt am",
+      "Ġp r",
+      "b a",
+      "ck Ã©",
+      "p rav",
+      "Ġu Å¾",
+      "s i",
+      "Å¾ en",
+      "Ġ O",
+      "Ġpro to",
+      "te l",
+      "Ġm ÃŃ",
+      "Ġko n",
+      "e z",
+      "m ÃŃ",
+      "l ov",
+      "Ġ R",
+      "Ġv o",
+      "Ġ <",
+      "Ġ li",
+      "Ġ ÅĻe",
+      "nÃŃ m",
+      "Ġ E",
+      "en t",
+      "st u",
+      "Ġt ÄĽ",
+      "Ġ C",
+      "ĠnÃ¡ vrh",
+      "l ou",
+      "v y",
+      "st e",
+      "Ġbu de",
+      "Ġn i",
+      "p u",
+      "Ġm i",
+      "Ġj Ã¡",
+      "Ġt Ã©",
+      "Ġm in",
+      "Ġt y",
+      "ovÃ¡ nÃŃ",
+      "Ġz p",
+      "m y",
+      "Äį ÃŃ",
+      "Ġ ÅĻÃŃ",
+      "d ou",
+      "Ġ H",
+      "st o",
+      "ĠzÃ¡ ko",
+      "Ġto ho",
+      "Ġkter Ã½",
+      "jÃŃ cÃŃ",
+      "Ġpo z",
+      "Ġta dy",
+      "s le",
+      "Ã¡ d",
+      "Ġ ce",
+      "em e",
+      "h le",
+      "nost i",
+      "Å¯ Å¾e",
+      "Å¾ ÃŃ",
+      "p i",
+      "ÅĻe j",
+      "Å¡ i",
+      "Ġm y",
+      "z a",
+      "en o",
+      "s a",
+      "Ã© ho",
+      "Ġdo b",
+      "Ġd e",
+      "Ġve l",
+      "Ġt u",
+      "Ġpro st",
+      "Äį i",
+      "Ġ L",
+      "ĠJ e",
+      "Ã¡l nÃŃ",
+      "Ġv z",
+      "Ġtak Ã©",
+      "Ġm a",
+      "ho d",
+      "ro v",
+      "Ġz e",
+      "ĠÄĮ e",
+      "s ob",
+      "Ġto m",
+      "Ġd Å¯",
+      "k Å¯",
+      "Ä ı",
+      "te r",
+      "Ġs p",
+      "t Ã¡",
+      "Ġproto Å¾e",
+      "Ġp a",
+      "Ġko le"
+    ]
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,52 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<mask>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "tokenizer_class": "PreTrainedTokenizerFast",
+  "unk_token": "<unk>"
+}