Upload tokenizer

Browse files

Files changed (4) hide show

README.md +201 -0
special_tokens_map.json +1 -0
tokenizer.json +1107 -0
tokenizer_config.json +39 -0

README.md ADDED Viewed

	@@ -0,0 +1,201 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,1107 @@

+{
+  "version": "1.0",
+  "truncation": null,
+  "padding": null,
+  "added_tokens": [
+    {
+      "id": 0,
+      "content": "[UNK]",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 1,
+      "content": "[CLS]",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 2,
+      "content": "[SEP]",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 3,
+      "content": "[PAD]",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    }
+  ],
+  "normalizer": {
+    "type": "Sequence",
+    "normalizers": [
+      {
+        "type": "NFD"
+      },
+      {
+        "type": "StripAccents"
+      },
+      {
+        "type": "Lowercase"
+      }
+    ]
+  },
+  "pre_tokenizer": {
+    "type": "Sequence",
+    "pretokenizers": [
+      {
+        "type": "Punctuation",
+        "behavior": "Isolated"
+      },
+      {
+        "type": "Whitespace"
+      },
+      {
+        "type": "Digits",
+        "individual_digits": true
+      }
+    ]
+  },
+  "post_processor": null,
+  "decoder": null,
+  "model": {
+    "type": "WordPiece",
+    "unk_token": "[UNK]",
+    "continuing_subword_prefix": "##",
+    "max_input_chars_per_word": 100,
+    "vocab": {
+      "[UNK]": 0,
+      "[CLS]": 1,
+      "[SEP]": 2,
+      "[PAD]": 3,
+      "!": 4,
+      "\"": 5,
+      "#": 6,
+      "$": 7,
+      "%": 8,
+      "&": 9,
+      "'": 10,
+      "(": 11,
+      ")": 12,
+      "*": 13,
+      "+": 14,
+      ",": 15,
+      "-": 16,
+      ".": 17,
+      "/": 18,
+      "0": 19,
+      "1": 20,
+      "2": 21,
+      "3": 22,
+      "4": 23,
+      "5": 24,
+      "6": 25,
+      "7": 26,
+      "8": 27,
+      "9": 28,
+      ":": 29,
+      ";": 30,
+      "<": 31,
+      "=": 32,
+      ">": 33,
+      "?": 34,
+      "@": 35,
+      "[": 36,
+      "\\": 37,
+      "]": 38,
+      "_": 39,
+      "`": 40,
+      "a": 41,
+      "b": 42,
+      "c": 43,
+      "d": 44,
+      "e": 45,
+      "f": 46,
+      "g": 47,
+      "h": 48,
+      "i": 49,
+      "j": 50,
+      "k": 51,
+      "l": 52,
+      "m": 53,
+      "n": 54,
+      "o": 55,
+      "p": 56,
+      "q": 57,
+      "r": 58,
+      "s": 59,
+      "t": 60,
+      "u": 61,
+      "v": 62,
+      "w": 63,
+      "x": 64,
+      "y": 65,
+      "z": 66,
+      "{": 67,
+      "|": 68,
+      "}": 69,
+      "~": 70,
+      "##a": 71,
+      "##n": 72,
+      "##d": 73,
+      "##e": 74,
+      "##r": 75,
+      "##x": 76,
+      "##p": 77,
+      "##c": 78,
+      "##t": 79,
+      "##i": 80,
+      "##o": 81,
+      "##l": 82,
+      "##s": 83,
+      "##h": 84,
+      "##u": 85,
+      "##y": 86,
+      "##w": 87,
+      "##v": 88,
+      "##m": 89,
+      "##b": 90,
+      "##g": 91,
+      "##f": 92,
+      "##k": 93,
+      "##j": 94,
+      "##z": 95,
+      "##q": 96,
+      "##he": 97,
+      "the": 98,
+      "##nd": 99,
+      "##ed": 100,
+      "and": 101,
+      "to": 102,
+      "##er": 103,
+      "wa": 104,
+      "##ou": 105,
+      "##in": 106,
+      "he": 107,
+      "##re": 108,
+      "was": 109,
+      "sa": 110,
+      "##ing": 111,
+      "##om": 112,
+      "she": 113,
+      "##ar": 114,
+      "##il": 115,
+      "##it": 116,
+      "##ay": 117,
+      "it": 118,
+      "##id": 119,
+      "##at": 120,
+      "they": 121,
+      "ha": 122,
+      "##en": 123,
+      "##is": 124,
+      "##an": 125,
+      "th": 126,
+      "on": 127,
+      "##or": 128,
+      "##im": 129,
+      "##on": 130,
+      "##ut": 131,
+      "her": 132,
+      "##ll": 133,
+      "##le": 134,
+      "##et": 135,
+      "##ot": 136,
+      "##ir": 137,
+      "##es": 138,
+      "in": 139,
+      "##ow": 140,
+      "you": 141,
+      "##ck": 142,
+      "##ld": 143,
+      "##oo": 144,
+      "said": 145,
+      "be": 146,
+      "##ily": 147,
+      "st": 148,
+      "##ig": 149,
+      "tim": 150,
+      "so": 151,
+      "##ce": 152,
+      "##pp": 153,
+      "wit": 154,
+      "his": 155,
+      "with": 156,
+      "mom": 157,
+      "##ve": 158,
+      "lily": 159,
+      "of": 160,
+      "##ked": 161,
+      "fr": 162,
+      "that": 163,
+      "pl": 164,
+      "##ery": 165,
+      "##am": 166,
+      "##ad": 167,
+      "##ke": 168,
+      "##nt": 169,
+      "but": 170,
+      "day": 171,
+      "up": 172,
+      "##ie": 173,
+      "play": 174,
+      "had": 175,
+      "##el": 176,
+      "wh": 177,
+      "for": 178,
+      "##st": 179,
+      "##un": 180,
+      "##ould": 181,
+      "an": 182,
+      "##my": 183,
+      "##ent": 184,
+      "li": 185,
+      "##ra": 186,
+      "##ch": 187,
+      "do": 188,
+      "happ": 189,
+      "one": 190,
+      "sh": 191,
+      "want": 192,
+      "##itt": 193,
+      "##her": 194,
+      "not": 195,
+      "there": 196,
+      "##ly": 197,
+      "##ome": 198,
+      "very": 199,
+      "##se": 200,
+      "##ound": 201,
+      "as": 202,
+      "ba": 203,
+      "litt": 204,
+      "little": 205,
+      "ne": 206,
+      "time": 207,
+      "##ht": 208,
+      "##al": 209,
+      "ma": 210,
+      "sm": 211,
+      "happy": 212,
+      "is": 213,
+      "big": 214,
+      "loo": 215,
+      "saw": 216,
+      "##iend": 217,
+      "friend": 218,
+      "##ry": 219,
+      "re": 220,
+      "bo": 221,
+      "##ur": 222,
+      "##ter": 223,
+      "##ug": 224,
+      "##ere": 225,
+      "##ved": 226,
+      "lo": 227,
+      "were": 228,
+      "once": 229,
+      "##ore": 230,
+      "se": 231,
+      "ev": 232,
+      "go": 233,
+      "sp": 234,
+      "him": 235,
+      "too": 236,
+      "ca": 237,
+      "##ide": 238,
+      "we": 239,
+      "are": 240,
+      "tom": 241,
+      "at": 242,
+      "can": 243,
+      "##irl": 244,
+      "##ard": 245,
+      "wanted": 246,
+      "whe": 247,
+      "ben": 248,
+      "##ill": 249,
+      "girl": 250,
+      "upon": 251,
+      "##ec": 252,
+      "their": 253,
+      "out": 254,
+      "them": 255,
+      "##ys": 256,
+      "##fu": 257,
+      "##way": 258,
+      "did": 259,
+      "smil": 260,
+      "##ind": 261,
+      "could": 262,
+      "have": 263,
+      "no": 264,
+      "##ri": 265,
+      "##ted": 266,
+      "##ver": 267,
+      "##ain": 268,
+      "ex": 269,
+      "##hed": 270,
+      "all": 271,
+      "ar": 272,
+      "went": 273,
+      "hel": 274,
+      "su": 275,
+      "##ic": 276,
+      "when": 277,
+      "ta": 278,
+      "help": 279,
+      "##ful": 280,
+      "##ood": 281,
+      "friends": 282,
+      "##hing": 283,
+      "nam": 284,
+      "##ight": 285,
+      "what": 286,
+      "kn": 287,
+      "le": 288,
+      "##um": 289,
+      "back": 290,
+      "##ark": 291,
+      "##one": 292,
+      "cl": 293,
+      "from": 294,
+      "fun": 295,
+      "al": 296,
+      "then": 297,
+      "##all": 298,
+      "timmy": 299,
+      "ro": 300,
+      "star": 301,
+      "every": 302,
+      "sc": 303,
+      "smiled": 304,
+      "named": 305,
+      "##oug": 306,
+      "loved": 307,
+      "##side": 308,
+      "man": 309,
+      "some": 310,
+      "##elt": 311,
+      "asked": 312,
+      "see": 313,
+      "me": 314,
+      "##ick": 315,
+      "like": 316,
+      "fe": 317,
+      "##op": 318,
+      "felt": 319,
+      "br": 320,
+      "looked": 321,
+      "around": 322,
+      "look": 323,
+      "##ep": 324,
+      "##ame": 325,
+      "##omet": 326,
+      "would": 327,
+      "bir": 328,
+      "somet": 329,
+      "##get": 330,
+      "fa": 331,
+      "##ong": 332,
+      "boy": 333,
+      "##ss": 334,
+      "bird": 335,
+      "pr": 336,
+      "##dd": 337,
+      "jo": 338,
+      "##est": 339,
+      "##ings": 340,
+      "ag": 341,
+      "wor": 342,
+      "mommy": 343,
+      "make": 344,
+      "##ade": 345,
+      "than": 346,
+      "car": 347,
+      "##own": 348,
+      "tre": 349,
+      "ran": 350,
+      "##gether": 351,
+      "together": 352,
+      "la": 353,
+      "away": 354,
+      "dad": 355,
+      "says": 356,
+      "started": 357,
+      "##ice": 358,
+      "##oud": 359,
+      "something": 360,
+      "##ared": 361,
+      "made": 362,
+      "co": 363,
+      "fl": 364,
+      "##ther": 365,
+      "##ited": 366,
+      "park": 367,
+      "sad": 368,
+      "good": 369,
+      "##ack": 370,
+      "other": 371,
+      "ch": 372,
+      "exc": 373,
+      "new": 374,
+      "put": 375,
+      "who": 376,
+      "##out": 377,
+      "let": 378,
+      "hug": 379,
+      "mu": 380,
+      "##ble": 381,
+      "again": 382,
+      "home": 383,
+      "sam": 384,
+      "found": 385,
+      "dec": 386,
+      "##pped": 387,
+      "##ried": 388,
+      "wal": 389,
+      "##ure": 390,
+      "get": 391,
+      "##ach": 392,
+      "playing": 393,
+      "##ought": 394,
+      "gra": 395,
+      "##na": 396,
+      "##pl": 397,
+      "sw": 398,
+      "##ous": 399,
+      "things": 400,
+      "bl": 401,
+      "excited": 402,
+      "your": 403,
+      "liked": 404,
+      "got": 405,
+      "##ny": 406,
+      "##uck": 407,
+      "##king": 408,
+      "##ge": 409,
+      "##ided": 410,
+      "decided": 411,
+      "came": 412,
+      "my": 413,
+      "dog": 414,
+      "this": 415,
+      "scared": 416,
+      "##ust": 417,
+      "down": 418,
+      "bec": 419,
+      "##ouse": 420,
+      "ab": 421,
+      "find": 422,
+      "pa": 423,
+      "care": 424,
+      "will": 425,
+      "po": 426,
+      "sara": 427,
+      "gr": 428,
+      "feel": 429,
+      "anna": 430,
+      "##ell": 431,
+      "max": 432,
+      "##as": 433,
+      "##ist": 434,
+      "bu": 435,
+      "##arn": 436,
+      "##ave": 437,
+      "##nder": 438,
+      "##ways": 439,
+      "##ess": 440,
+      "always": 441,
+      "mo": 442,
+      "took": 443,
+      "about": 444,
+      "##ers": 445,
+      "lot": 446,
+      "##ook": 447,
+      "toys": 448,
+      "kne": 449,
+      "didn": 450,
+      "##ged": 451,
+      "##ise": 452,
+      "##bb": 453,
+      "old": 454,
+      "outside": 455,
+      "tree": 456,
+      "ho": 457,
+      "##ite": 458,
+      "how": 459,
+      "##ally": 460,
+      "ball": 461,
+      "thought": 462,
+      "##ant": 463,
+      "sor": 464,
+      "af": 465,
+      "more": 466,
+      "##ma": 467,
+      "##eci": 468,
+      "##ened": 469,
+      "##ched": 470,
+      "learn": 471,
+      "##ret": 472,
+      "pu": 473,
+      "tw": 474,
+      "know": 475,
+      "cat": 476,
+      "##to": 477,
+      "take": 478,
+      "don": 479,
+      "mi": 480,
+      "laug": 481,
+      "pe": 482,
+      "speci": 483,
+      "special": 484,
+      "sudd": 485,
+      "sudden": 486,
+      "inside": 487,
+      "knew": 488,
+      "##ty": 489,
+      "sorry": 490,
+      "##ive": 491,
+      "##ro": 492,
+      "any": 493,
+      "jack": 494,
+      "##ff": 495,
+      "suddenly": 496,
+      "just": 497,
+      "toy": 498,
+      "##ue": 499,
+      "after": 500,
+      "tr": 501,
+      "if": 502,
+      "show": 503,
+      "##ink": 504,
+      "##lly": 505,
+      "or": 506,
+      "run": 507,
+      "ra": 508,
+      "sl": 509,
+      "much": 510,
+      "##ish": 511,
+      "hand": 512,
+      "yes": 513,
+      "house": 514,
+      "sun": 515,
+      "op": 516,
+      "sk": 517,
+      "clo": 518,
+      "into": 519,
+      "fin": 520,
+      "tried": 521,
+      "en": 522,
+      "water": 523,
+      "##ate": 524,
+      "told": 525,
+      "each": 526,
+      "##ea": 527,
+      "over": 528,
+      "proud": 529,
+      "##ump": 530,
+      "gave": 531,
+      "heard": 532,
+      "never": 533,
+      "ok": 534,
+      "##dy": 535,
+      "##by": 536,
+      "thank": 537,
+      "##use": 538,
+      "eat": 539,
+      "room": 540,
+      "expl": 541,
+      "##other": 542,
+      "pick": 543,
+      "pret": 544,
+      "qu": 545,
+      "##lled": 546,
+      "gre": 547,
+      "come": 548,
+      "couldn": 549,
+      "played": 550,
+      "##ion": 551,
+      "mia": 552,
+      "sha": 553,
+      "##ause": 554,
+      "because": 555,
+      "wat": 556,
+      "hugged": 557,
+      "##ious": 558,
+      "off": 559,
+      "bear": 560,
+      "now": 561,
+      "com": 562,
+      "nice": 563,
+      "fo": 564,
+      "##oth": 565,
+      "box": 566,
+      "##our": 567,
+      "str": 568,
+      "need": 569,
+      "bet": 570,
+      "many": 571,
+      "##ile": 572,
+      "##fe": 573,
+      "##ft": 574,
+      "small": 575,
+      "##eep": 576,
+      "long": 577,
+      "##ving": 578,
+      "##kes": 579,
+      "##sed": 580,
+      "##gry": 581,
+      "anim": 582,
+      "animal": 583,
+      "try": 584,
+      "end": 585,
+      "##ough": 586,
+      "unt": 587,
+      "##cy": 588,
+      "##ild": 589,
+      "until": 590,
+      "even": 591,
+      "##urt": 592,
+      "##ort": 593,
+      "##elf": 594,
+      "soon": 595,
+      "##iz": 596,
+      "kind": 597,
+      "love": 598,
+      "learned": 599,
+      "bea": 600,
+      "everyone": 601,
+      "by": 602,
+      "ad": 603,
+      "better": 604,
+      "flow": 605,
+      "spot": 606,
+      "##ine": 607,
+      "best": 608,
+      "##urp": 609,
+      "say": 610,
+      "##mp": 611,
+      "##ady": 612,
+      "##ream": 613,
+      "cle": 614,
+      "##urn": 615,
+      "##ace": 616,
+      "##ves": 617,
+      "fi": 618,
+      "gard": 619,
+      "garden": 620,
+      "fast": 621,
+      "its": 622,
+      "che": 623,
+      "careful": 624,
+      "##ber": 625,
+      "beaut": 626,
+      "##ies": 627,
+      "bra": 628,
+      "thanked": 629,
+      "sky": 630,
+      "laughed": 631,
+      "jump": 632,
+      "gl": 633,
+      "loud": 634,
+      "sn": 635,
+      "list": 636,
+      "ow": 637,
+      "##ear": 638,
+      "##ct": 639,
+      "##iny": 640,
+      "wo": 641,
+      "beauti": 642,
+      "##sh": 643,
+      "##lew": 644,
+      "lots": 645,
+      "beautiful": 646,
+      "hard": 647,
+      "still": 648,
+      "animals": 649,
+      "fam": 650,
+      "joh": 651,
+      "under": 652,
+      "john": 653,
+      "lu": 654,
+      "mum": 655,
+      "stay": 656,
+      "hurt": 657,
+      "##ning": 658,
+      "both": 659,
+      "dan": 660,
+      "##ree": 661,
+      "##self": 662,
+      "way": 663,
+      "rem": 664,
+      "two": 665,
+      "##hes": 666,
+      "bad": 667,
+      "safe": 668,
+      "col": 669,
+      "##ool": 670,
+      "##be": 671,
+      "di": 672,
+      "red": 673,
+      "book": 674,
+      "imp": 675,
+      "tow": 676,
+      "##ople": 677,
+      "##ane": 678,
+      "lived": 679,
+      "people": 680,
+      "okay": 681,
+      "##em": 682,
+      "lucy": 683,
+      "walked": 684,
+      "should": 685,
+      "brave": 686,
+      "surp": 687,
+      "surpr": 688,
+      "##ase": 689,
+      "##ock": 690,
+      "##ished": 691,
+      "angry": 692,
+      "family": 693,
+      "adv": 694,
+      "##ress": 695,
+      "##igh": 696,
+      "flew": 697,
+      "called": 698,
+      "stor": 699,
+      "##ept": 700,
+      "##ip": 701,
+      "##eet": 702,
+      "sure": 703,
+      "##led": 704,
+      "fore": 705,
+      "kept": 706,
+      "fly": 707,
+      "share": 708,
+      "##ect": 709,
+      "##fore": 710,
+      "before": 711,
+      "##ger": 712,
+      "##art": 713,
+      "while": 714,
+      "##xt": 715,
+      "##ised": 716,
+      "##dded": 717,
+      "pic": 718,
+      "keep": 719,
+      "door": 720,
+      "going": 721,
+      "pretty": 722,
+      "rock": 723,
+      "clean": 724,
+      "dra": 725,
+      "next": 726,
+      "why": 727,
+      "may": 728,
+      "advent": 729,
+      "##ied": 730,
+      "##ary": 731,
+      "give": 732,
+      "noise": 733,
+      "con": 734,
+      "opened": 735,
+      "shiny": 736,
+      "far": 737,
+      "un": 738,
+      "doll": 739,
+      "wind": 740,
+      "id": 741,
+      "cry": 742,
+      "grand": 743,
+      "real": 744,
+      "##illy": 745,
+      "##end": 746,
+      "sto": 747,
+      "##so": 748,
+      "turn": 749,
+      "explore": 750,
+      "ground": 751,
+      "also": 752,
+      "##les": 753,
+      "ey": 754,
+      "##ner": 755,
+      "idea": 756,
+      "color": 757,
+      "##ap": 758,
+      "where": 759,
+      "nodded": 760,
+      "war": 761,
+      "feeling": 762,
+      "blue": 763,
+      "bob": 764,
+      "picked": 765,
+      "##imb": 766,
+      "walking": 767,
+      "clos": 768,
+      "##thing": 769,
+      "climb": 770,
+      "thr": 771,
+      "##ting": 772,
+      "has": 773,
+      "ple": 774,
+      "wait": 775,
+      "bed": 776,
+      "smile": 777,
+      "adventure": 778,
+      "##oy": 779,
+      "being": 780,
+      "##th": 781,
+      "maybe": 782,
+      "finally": 783,
+      "looking": 784,
+      "da": 785,
+      "##iced": 786,
+      "##ture": 787,
+      "diff": 788,
+      "wr": 789,
+      "##and": 790,
+      "food": 791,
+      "listen": 792,
+      "##joy": 793,
+      "picture": 794,
+      "think": 795,
+      "remem": 796,
+      "del": 797,
+      "truck": 798,
+      "tra": 799,
+      "eyes": 800,
+      "##ught": 801,
+      "bro": 802,
+      "gi": 803,
+      "repl": 804,
+      "stopped": 805,
+      "here": 806,
+      "great": 807,
+      "bre": 808,
+      "ru": 809,
+      "vo": 810,
+      "walk": 811,
+      "##qu": 812,
+      "year": 813,
+      "enjoy": 814,
+      "wonder": 815,
+      "remember": 816,
+      "##able": 817,
+      "hands": 818,
+      "sue": 819,
+      "quick": 820,
+      "##per": 821,
+      "ever": 822,
+      "cur": 823,
+      "head": 824,
+      "flowers": 825,
+      "import": 826,
+      "forest": 827,
+      "##og": 828,
+      "ac": 829,
+      "noticed": 830,
+      "near": 831,
+      "##bbit": 832,
+      "app": 833,
+      "dis": 834,
+      "##ized": 835,
+      "watch": 836,
+      "##irst": 837,
+      "rabbit": 838,
+      "fish": 839,
+      "##llow": 840,
+      "important": 841,
+      "##age": 842,
+      "ama": 843,
+      "us": 844,
+      "rain": 845,
+      "sound": 846,
+      "slide": 847,
+      "bun": 848,
+      "amaz": 849,
+      "replied": 850,
+      "stop": 851,
+      "work": 852,
+      "follow": 853,
+      "mor": 854,
+      "showed": 855,
+      "tal": 856,
+      "mean": 857,
+      "tou": 858,
+      "##ces": 859,
+      "##gan": 860,
+      "right": 861,
+      "sarah": 862,
+      "ask": 863,
+      "##bbed": 864,
+      "differ": 865,
+      "mag": 866,
+      "goodby": 867,
+      "our": 868,
+      "bright": 869,
+      "##day": 870,
+      "watched": 871,
+      "goodbye": 872,
+      "hop": 873,
+      "please": 874,
+      "use": 875,
+      "strong": 876,
+      "am": 877,
+      "stick": 878,
+      "quickly": 879,
+      "daddy": 880,
+      "been": 881,
+      "voice": 882,
+      "yell": 883,
+      "##ath": 884,
+      "jane": 885,
+      "different": 886,
+      "child": 887,
+      "boat": 888,
+      "##llo": 889,
+      "##co": 890,
+      "##che": 891,
+      "hello": 892,
+      "first": 893,
+      "became": 894,
+      "face": 895,
+      "place": 896,
+      "##ange": 897,
+      "does": 898,
+      "high": 899,
+      "##ng": 900,
+      "store": 901,
+      "closer": 902,
+      "curious": 903,
+      "##ak": 904,
+      "warm": 905,
+      "sand": 906,
+      "dress": 907,
+      "bel": 908,
+      "##ummy": 909,
+      "joe": 910,
+      "cook": 911,
+      "tell": 912,
+      "##ila": 913,
+      "forg": 914,
+      "em": 915,
+      "three": 916,
+      "fav": 917,
+      "grandma": 918,
+      "##oon": 919,
+      "##ount": 920,
+      "open": 921,
+      "lila": 922,
+      "##leep": 923,
+      "night": 924,
+      "cake": 925,
+      "block": 926,
+      "##more": 927,
+      "pie": 928,
+      "anymore": 929,
+      "bunny": 930,
+      "butter": 931,
+      "mon": 932,
+      "##iss": 933,
+      "##ached": 934,
+      "lea": 935,
+      "sweet": 936,
+      "##ired": 937,
+      "kid": 938,
+      "##ull": 939,
+      "only": 940,
+      "flo": 941,
+      "kit": 942,
+      "pain": 943,
+      "grabbed": 944,
+      "duck": 945,
+      "##isy": 946,
+      "fell": 947,
+      "birds": 948,
+      "fire": 949,
+      "cont": 950,
+      "##ered": 951,
+      "jumped": 952,
+      "glad": 953,
+      "pet": 954,
+      "##here": 955,
+      "bit": 956,
+      "per": 957,
+      "##chen": 958,
+      "kitchen": 959,
+      "cra": 960,
+      "helped": 961,
+      "sees": 962,
+      "dr": 963,
+      "##outed": 964,
+      "hear": 965,
+      "sing": 966,
+      "##ul": 967,
+      "prin": 968,
+      "happened": 969,
+      "squ": 970,
+      "yummy": 971,
+      "grass": 972,
+      "tri": 973,
+      "ready": 974,
+      "tommy": 975,
+      "story": 976,
+      "shouted": 977,
+      "beh": 978,
+      "##nts": 979,
+      "really": 980,
+      "brother": 981,
+      "cr": 982,
+      "reached": 983,
+      "realized": 984,
+      "lady": 985,
+      "##ey": 986,
+      "draw": 987,
+      "having": 988,
+      "##ins": 989,
+      "through": 990,
+      "favor": 991,
+      "mess": 992,
+      "cre": 993,
+      "hat": 994,
+      "favorite": 995,
+      "game": 996,
+      "less": 997,
+      "ate": 998,
+      "underst": 999,
+      "soft": 1000,
+      "pare": 1001,
+      "##zy": 1002,
+      "thing": 1003,
+      "##ket": 1004,
+      "##ather": 1005,
+      "##imes": 1006,
+      "cu": 1007,
+      "magic": 1008,
+      "pretend": 1009,
+      "began": 1010,
+      "##where": 1011,
+      "world": 1012,
+      "looks": 1013,
+      "##ken": 1014,
+      "butterf": 1015,
+      "himself": 1016,
+      "kids": 1017,
+      "##fully": 1018,
+      "making": 1019,
+      "done": 1020,
+      "wow": 1021,
+      "cut": 1022,
+      "sometimes": 1023
+    }
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "model_max_length": 1000000000000000019884624838656,
+  "tokenizer_class": "PreTrainedTokenizerFast"
+}