OrionZheng commited on
Commit
1f7a079
1 Parent(s): 95f9586

Upload tokenizer and modeling_openmoe.py

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,127 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ <p align="center">
5
+ <img width="150px" alt="OpenMoE" src="https://github.com/XueFuzhao/OpenMoE/blob/main/logo.jpg?raw=true">
6
+ </p>
7
+ <p align="center"><a href="https://github.com/XueFuzhao/OpenMoE/tree/main">[Github]</a> | <a href="https://colab.research.google.com/drive/1xIfIVafnlCP2XVICmRwkUFK3cwTJYjCY#scrollTo=62T-2mH_tsjG">[Colab Demo]</a> | <a href="https://huggingface.co/OrionZheng">[Huggingface]</a> | <a href="https://discord.gg/bjGnGfjegU">[Discord]</a> | <a href="https://twitter.com/xuefz/status/1693696988611739947?s=61&t=Xc2k2W7vU_hlpNizGDCmOw">[Twitter]</a> | <a href="https://xuefuzhao.notion.site/Aug-2023-OpenMoE-v0-2-Release-43808efc0f5845caa788f2db52021879">[Blog]</a></p>
8
+ </p>
9
+ <hr>
10
+
11
+ # OpenMoE-8B(890B tokens)
12
+ OpenMoE is a project aimed at igniting the open-source MoE community! We are releasing a family of open-sourced Mixture-of-Experts (MoE) Large Language Models.
13
+
14
+ Our project began in the summer of 2023. On August 22, 2023, we released the first batch of intermediate checkpoints (OpenMoE-base&8B), along with the data and code [[Twitter]](https://twitter.com/xuefz/status/1693696988611739947?s=61&t=Xc2k2W7vU_hlpNizGDCmOw). Subsequently, the OpenMoE-8B training was completed in November, 2023. After that, we embarked on explorations on 34B scale model, which is still ongoing.
15
+
16
+ As a small student team, instead of pursuing the best model with better data, computation, and human power, we devote to fully sharing our training data, strategies, model architecture, weights, and everything we have with the community. We hope this project will promote research on this promising field and invite more contributors to work on open-sourced MoE projects together!
17
+
18
+ [2024.01.12] The paper for the project and more evaluations are underway. For more information about the model, training, and evaluations, please visit our GitHub [repository](https://github.com/XueFuzhao/OpenMoE/tree/main).
19
+
20
+
21
+ ## Model Weights
22
+ Currently, three models are released in total: OpenMoE-base, OpenMoE-8B/8B-Chat, and OpenMoE-34B(at 200B tokens).
23
+
24
+ The table below lists the 8B/8B-Chat model that has completed training on 1.1T tokens.
25
+
26
+ | Model Name | Description | #Param |Huggingface |
27
+ |----------------|-------------------------------------------------|----------|-------------|
28
+ | **OpenMoE-8B(1.1T)** | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b) |
29
+ | **OpenMoE-8B-Chat (1.1T+SFT)** | OpenMoE-8B-1.1T supervised finetuned on the [WildChat GPT-4 Subset](https://huggingface.co/datasets/allenai/WildChat-nontoxic) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-chat) |
30
+
31
+
32
+ Besides, we also provide all our intermediate checkpoints(base, 8B, 34B) for research purposes.
33
+
34
+ | Model Name | Description | #Param |Huggingface |
35
+ |----------------|-------------------------------------------------|----------|-------------|
36
+ | **OpenMoE-34B-200B** | 34B MoE with comparable FLOPs of a 7B LLaMA(No SFT) |34B |[Link](https://huggingface.co/OrionZheng/openmoe-34b-200B) |
37
+ | OpenMoE-8B-200B | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-200B) |
38
+ | OpenMoE-8B-400B | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-400B) |
39
+ | OpenMoE-8B-600B | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-600B) |
40
+ | OpenMoE-8B-800B | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-800B) |
41
+ | OpenMoE-8B-1T | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-1T) |
42
+ | OpenMoE-base(128B) | A small MoE model for debugging only |637M |[Link](https://huggingface.co/OrionZheng/openmoe-base) |
43
+ | OpenLLaMA-base(128B) | A dense counter-part of OpenMoE-base |310M |[Link](https://huggingface.co/fuzhao/OpenLLaMA_Base) |
44
+
45
+
46
+ The base model, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architecture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications. Better performence can be oberved from our 8B or 34B versions.
47
+
48
+ The OpenMoE-8B with 4 MoE layers and 32 experts has been trained by 1.1T tokens. The SFT version has also been released after we finetuned the OpenMoE-8B-1.1T on the [wildchat]((https://huggingface.co/datasets/allenai/WildChat-nontoxic)) dataset's GPT-4 subset. The intermediate checkpoints at 200B, 400B, 600B, 800B, 1T tokens can be used to study the training dynamics of MoE architexture.
49
+
50
+ We are still training our OpenMoE-34B, which is a MoE model with 8 MoE layer and 32 experts. We released the intermediate checkpoint trained on 200B tokens on huggingface. If you are interested in the latest checkpoint, please feel free to drop Fuzhao an email (f.xue@u.nus.edu).
51
+
52
+ ## Get Started
53
+
54
+ ### Inference with Pytorch
55
+ Our PyToch implementation is supported by [Colossal AI](https://github.com/hpcaitech/ColossalAI). You can install our forked version directly for easier setup:
56
+ ```
57
+ # Python version: 3.10.12
58
+ # Install ColossalAI
59
+ git clone --branch my_openmoe https://github.com/Orion-Zheng/ColossalAI.git
60
+ pip install ./ColossalAI
61
+ python -m pip install -r ./ColossalAI/examples/language/openmoe/requirements.txt
62
+ ```
63
+
64
+ Then, you can inference by the following code on a A100 80GB machine.
65
+ ```
66
+ from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
67
+
68
+ model_path = "ckpts/openmoe-8b-chat"
69
+ config = AutoConfig.from_pretrained(model_path)
70
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
71
+ model = AutoModelForCausalLM.from_pretrained(
72
+ model_path,
73
+ torch_dtype=torch.bfloat16,
74
+ trust_remote_code=True,
75
+ device_map='auto'
76
+ )
77
+ query = 'Question: How do I kill a process? Answer:'
78
+ prompt = f'''<<SYS>>
79
+ You are a helpful, respectful and honest assistant.
80
+ <</SYS>>
81
+
82
+ <s>[INST] {query} [/INST]'''
83
+
84
+ inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
85
+ sample = model.generate(**inputs, max_new_tokens=32)
86
+ print(tokenizer.decode(sample[0]))
87
+ ```
88
+
89
+
90
+ If you don't have GPUs on your hand, don't worry! you can still experience our model on Colab(Note: this require a $10 Colab Pro Plan). You can experiment with OpenMoE-8B-Chat on Colab directly by [this](https://colab.research.google.com/drive/1xIfIVafnlCP2XVICmRwkUFK3cwTJYjCY).
91
+ - Running OpenMoE-8B requires ~49GB of memory in float32 or ~23GB in bfloat16. It can be executed on a Colab `CPU High-RAM`(in float32) runtime or an `A100-40GB`(in bfloat16) runtime, both of which require Colab Pro. The float16 precision is not recommended because sometimes it will lead to performance degradation.
92
+ - Runing the OpenMoE-34B requries ~89GB of memory in bfloat16 or ~180GB in float32. To perform inference on multiple devices/offloading model weights to RAM, please refer to the script [here](https://github.com/XueFuzhao/OpenMoE/blob/main/script/inference_on_multi_devices.py).
93
+ - A more detailed env setup script can be found [here](https://github.com/XueFuzhao/OpenMoE/blob/main/env/prepare_env.sh), or if you use docker, you can refer to the dockerfile [here](https://github.com/XueFuzhao/OpenMoE/blob/main/env/openmoe_infer_dockerfile). Note: you don't need t5x and Jax dependency if you are using our [huggingface ckpts](https://huggingface.co/OrionZheng/openmoe-8b-chat) without converting the jax checkpoints.
94
+
95
+ Besides, we also provide a Colab [tutorial](https://colab.research.google.com/drive/1eIT1rtG7pORRQAYtQoMOAekUg7aZLDdn) demonstrating the jax checkpoint conversion.
96
+
97
+
98
+ ## License
99
+
100
+ Our code is under Apache 2.0 License.
101
+
102
+ Since the models are trained on The Redpajama and The Stack dataset, please check the license of these two datasets for your model usage.
103
+
104
+
105
+ ## Authors
106
+
107
+ This project is currently contributed by the following authors:
108
+
109
+ [Fuzhao Xue](https://xuefuzhao.github.io/), [Zian Zheng](https://zheng-zian-andy.com), [Yao Fu](https://franxyao.github.io/), [Jinjie Ni](http://jinjie.one/), [Zangwei Zheng](https://zhengzangw.github.io/), [Wangchunshu Zhou](https://michaelzhouwang.github.io/), [Yang You](https://www.comp.nus.edu.sg/~youy/)
110
+
111
+ ## Acknowledgement
112
+ The computational resources for this project were generously provided by the [Google TPU Research Cloud(TRC)](https://sites.research.google/trc/about/). We extend our heartfelt thanks to TRC for their invaluable support, which has been fundamental to the success of our work. Besides, we are extremely grateful to the [ColossalAI Team](https://github.com/hpcaitech/ColossalAI) for their tremendous support with the PyTorch implementation, especially [Xuanlei Zhao](https://oahzxl.github.io/) and [Wenhao Chen](https://github.com/CWHer), making training and inference of OpenMoE on GPUs a reality.
113
+
114
+ ## Citation
115
+
116
+ Please cite the repo if you use the model and code in this repo.
117
+
118
+ ```bibtex
119
+ @misc{openmoe2023,
120
+ author = {Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou and Yang You},
121
+ title = {OpenMoE: Open Mixture-of-Experts Language Models},
122
+ year = {2023},
123
+ publisher = {GitHub},
124
+ journal = {GitHub repository},
125
+ howpublished = {\url{https://github.com/XueFuzhao/OpenMoE}},
126
+ }
127
+ ```
modeling_openmoe.py ADDED
@@ -0,0 +1,1140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
5
+ # and OPT implementations in this library. It has been modified from its
6
+ # original forms to accommodate minor architectural differences compared
7
+ # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
8
+ #
9
+ # Licensed under the Apache License, Version 2.0 (the "License");
10
+ # you may not use this file except in compliance with the License.
11
+ # You may obtain a copy of the License at
12
+ #
13
+ # http://www.apache.org/licenses/LICENSE-2.0
14
+ #
15
+ # Unless required by applicable law or agreed to in writing, software
16
+ # distributed under the License is distributed on an "AS IS" BASIS,
17
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18
+ # See the License for the specific language governing permissions and
19
+ # limitations under the License.
20
+ """ PyTorch OpenMoE model."""
21
+ import math
22
+ from typing import List, Optional, Tuple, Union
23
+
24
+ import torch
25
+ import torch.nn.functional as F
26
+ import torch.utils.checkpoint
27
+ from torch import nn
28
+ from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast
29
+ from transformers.modeling_utils import PreTrainedModel
30
+ from transformers.models.llama.configuration_llama import LlamaConfig
31
+ # from .llama_attn import LlamaAttention
32
+
33
+ from transformers.utils import (
34
+ add_start_docstrings,
35
+ add_start_docstrings_to_model_forward,
36
+ logging,
37
+ replace_return_docstrings,
38
+ )
39
+
40
+ from colossalai.kernel.cuda_native.mha.flash_attn_2 import HAS_FLASH_ATTN
41
+ from colossalai.kernel.triton.llama_act_combine_kernel import HAS_TRITON
42
+ from colossalai.moe.layers import SparseMLP
43
+ from colossalai.moe.manager import MOE_MANAGER
44
+ from colossalai.moe.utils import get_activation, set_moe_args
45
+
46
+
47
+
48
+ if HAS_TRITON:
49
+ from colossalai.kernel.triton.llama_act_combine_kernel import LlamaActCombine
50
+
51
+ logger = logging.get_logger(__name__)
52
+
53
+ _CONFIG_FOR_DOC = "LlamaConfig"
54
+
55
+
56
+ def set_openmoe_args(
57
+ config: LlamaConfig,
58
+ num_experts: int,
59
+ moe_layer_interval: int,
60
+ router_topk: int = 2,
61
+ router_capacity_factor_train: float = 1.25,
62
+ router_capacity_factor_eval: float = 2.0,
63
+ router_min_capacity: int = 4,
64
+ router_noisy_policy: str = None,
65
+ router_drop_tks: bool = True,
66
+ router_aux_loss_factor: float = 0.01,
67
+ router_z_loss_factor: float = 0.0001,
68
+ mlp_gated: bool = True,
69
+ label_smoothing: float = 0.001,
70
+ z_loss_factor: float = 0.01,
71
+ enable_load_balance: bool = False,
72
+ load_balance_tolerance: float = 0.1,
73
+ load_balance_beam_width: int = 8,
74
+ load_balance_group_swap_factor: float = 0.4,
75
+ enable_kernel: bool = False,
76
+ enable_comm_overlap: bool = False,
77
+ enable_hierarchical_alltoall: bool = False,
78
+ ) -> None:
79
+ """
80
+ MoE related arguments.
81
+ It inserts the MoE arguments into the Llama config.
82
+
83
+ Args:
84
+ config (LlamaConfig): Transformers Llama config.
85
+ num_experts (int, optional): Number of experts.
86
+ moe_layer_interval (int, optional): The interval moe layer.
87
+ router_topk (int, optional): Moe router top k. Defaults to 2.
88
+ router_capacity_factor_train (float, optional): Moe router max capacity for train. Defaults to 1.25.
89
+ router_capacity_factor_eval (float, optional): Moe router max capacity for eval. Defaults to 2.0.
90
+ router_min_capacity (int, optional): Moe router min capacity. Defaults to 4.
91
+ router_noisy_policy (str, optional): Moe router noisy policy. You can choose [Jitter, Gaussian, None]. Defaults to None.
92
+ router_drop_tks (bool, optional): Whether moe router drop tokens which exceed max capacity. Defaults to True.
93
+ router_aux_loss_factor (float, optional): Moe router aux loss. You can refer to STMoE for details. Defaults to 0.01.
94
+ router_z_loss_factor (float, optional): Moe router z loss. You can refer to STMoE for details. Defaults to 0.01.
95
+ mlp_gated (bool, optional): Use gate in mlp. Defaults to True.
96
+ label_smoothing (float, optional): Label smoothing. Defaults to 0.001.
97
+ z_loss_factor (float, optional): The final outputs' classification z loss factor. Defaults to 0.01.
98
+ enable_load_balance (bool, optional): Expert load balance. Defaults to False.
99
+ load_balance_tolerance (float, optional): Expert load balance search's difference tolerance. Defaults to 0.1.
100
+ load_balance_beam_width (int, optional): Expert load balance search's beam width. Defaults to 8.
101
+ load_balance_group_swap_factor (float, optional): Expert load balance group swap factor. Longer value encourages less swap. Defaults to 0.4.
102
+ enable_kernel (bool, optional): Use kernel optimization. Defaults to False.
103
+ enable_comm_overlap (bool, optional): Use communication overlap for MoE. Recommended to enable for muiti-node training. Defaults to False.
104
+ enable_hierarchical_alltoall (bool, optional): Use hierarchical alltoall for MoE. Defaults to False.
105
+ """
106
+ moe_args = dict(
107
+ num_experts=num_experts,
108
+ moe_layer_interval=moe_layer_interval,
109
+ router_topk=router_topk,
110
+ router_capacity_factor_train=router_capacity_factor_train,
111
+ router_capacity_factor_eval=router_capacity_factor_eval,
112
+ router_min_capacity=router_min_capacity,
113
+ router_noisy_policy=router_noisy_policy,
114
+ router_drop_tks=router_drop_tks,
115
+ router_aux_loss_factor=router_aux_loss_factor,
116
+ router_z_loss_factor=router_z_loss_factor,
117
+ mlp_gated=mlp_gated,
118
+ label_smoothing=label_smoothing,
119
+ z_loss_factor=z_loss_factor,
120
+ enable_load_balance=enable_load_balance,
121
+ load_balance_tolerance=load_balance_tolerance,
122
+ load_balance_beam_width=load_balance_beam_width,
123
+ load_balance_group_swap_factor=load_balance_group_swap_factor,
124
+ enable_kernel=enable_kernel,
125
+ enable_comm_overlap=enable_comm_overlap,
126
+ enable_hierarchical_alltoall=enable_hierarchical_alltoall,
127
+ )
128
+ set_moe_args(config, moe_args)
129
+
130
+
131
+ # Copied from transformers.models.bart.modeling_bart._make_causal_mask
132
+ def _make_causal_mask(
133
+ input_ids_shape: torch.Size, dtype: torch.dtype, device: torch.device, past_key_values_length: int = 0
134
+ ):
135
+ """
136
+ Make causal mask used for bi-directional self-attention.
137
+ """
138
+ bsz, tgt_len = input_ids_shape
139
+ mask = torch.full((tgt_len, tgt_len), torch.finfo(dtype).min, device=device)
140
+ mask_cond = torch.arange(mask.size(-1), device=device)
141
+ mask.masked_fill_(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0)
142
+ mask = mask.to(dtype)
143
+
144
+ if past_key_values_length > 0:
145
+ mask = torch.cat([torch.zeros(tgt_len, past_key_values_length, dtype=dtype, device=device), mask], dim=-1)
146
+ return mask[None, None, :, :].expand(bsz, 1, tgt_len, tgt_len + past_key_values_length)
147
+
148
+
149
+ # Copied from transformers.models.bart.modeling_bart._expand_mask
150
+ def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
151
+ """
152
+ Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
153
+ """
154
+ bsz, src_len = mask.size()
155
+ tgt_len = tgt_len if tgt_len is not None else src_len
156
+
157
+ expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)
158
+
159
+ inverted_mask = 1.0 - expanded_mask
160
+
161
+ return inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(dtype).min)
162
+
163
+
164
+ def apply_rotary_embedding(q, k, cos, sin, decode=False, rotary_index=None):
165
+ # q: (bs, q_len, num_heads, head_dim)
166
+ # k: (bs, q_len [+past_kv_len], num_heads, head_dim)
167
+ # cos: (max_seq_len, head_dim)
168
+ # sin: (max_seq_len, head_dim)
169
+ # rotary_index: (bs, 1) # only used during decoding, when one query token is input at a time
170
+ """Helper function to apply Rotary Embeddings."""
171
+ cos = cos.to(q.dtype)
172
+ sin = sin.to(q.dtype)
173
+
174
+ if len(k.shape) == 3: # for multi query attention
175
+ k = k.unsqueeze(2)
176
+ multiquery = True
177
+ else:
178
+ multiquery = False
179
+
180
+ batch, qlen, qheads, d = q.shape
181
+ kbatch, klen, kheads, kd = k.shape
182
+ assert batch == kbatch, f"{batch} != {kbatch}"
183
+ assert d == kd, f"{d} != {kd}"
184
+ if decode and qlen == 1 and rotary_index is not None:
185
+ qcos = cos[rotary_index, :] # (bs, 1, head_dim)
186
+ qsin = sin[rotary_index, :] # (bs, 1, head_dim)
187
+ qcos = qcos.unsqueeze(2) # (bs, q_len=1, 1, head_dim) # broadcast to all heads
188
+ qsin = qsin.unsqueeze(2) # (bs, q_len=1, 1, head_dim)
189
+ else:
190
+ qcos, qsin = cos[:qlen, :], sin[:qlen, :] # (q_len, head_dim)
191
+ qcos = qcos.unsqueeze(0).unsqueeze(2) # (1, q_len, 1, head_dim)
192
+ qsin = qsin.unsqueeze(0).unsqueeze(2)
193
+
194
+ kcos, ksin = cos[:klen, :], sin[:klen, :] # (k_len, head_dim)
195
+ kcos = kcos.unsqueeze(0).unsqueeze(2) # (1, k_len, 1, head_dim) # broadcast to the whole batch, broadcast to all heads
196
+ ksin = ksin.unsqueeze(0).unsqueeze(2) # (1, k_len, 1, head_dim)
197
+ out_q = (q * qcos) + (rotate_half(q) * qsin)
198
+ out_k = (k * kcos) + (rotate_half(k) * ksin)
199
+
200
+ if multiquery:
201
+ out_k = out_k.squeeze(2)
202
+
203
+ return out_q, out_k
204
+
205
+
206
+ def rotate_half(x):
207
+ """Rotates half the hidden dims of the input."""
208
+ x1 = x[..., : x.shape[-1] // 2]
209
+ x2 = x[..., x.shape[-1] // 2 :]
210
+ return torch.cat((-x2, x1), dim=-1)
211
+
212
+ class LlamaRMSNorm(nn.Module):
213
+ def __init__(self, hidden_size, eps=1e-6):
214
+ """
215
+ LlamaRMSNorm is equivalent to T5LayerNorm
216
+ """
217
+ super().__init__()
218
+ self.weight = nn.Parameter(torch.ones(hidden_size))
219
+ self.variance_epsilon = eps
220
+
221
+ def forward(self, hidden_states):
222
+ input_dtype = hidden_states.dtype
223
+ hidden_states = hidden_states.to(torch.float32)
224
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
225
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
226
+ return self.weight * hidden_states.to(input_dtype)
227
+
228
+ def SwiGLU(x):
229
+ """Gated linear unit activation function.
230
+ Args:
231
+ x : input array
232
+ axis: the axis along which the split should be computed (default: -1)
233
+ """
234
+ size = x.shape[-1]
235
+ assert size % 2 == 0, "axis size must be divisible by 2"
236
+ x1, x2 = torch.split(x, size // 2, -1)
237
+ return x1 * (x2 * torch.sigmoid(x2))
238
+
239
+
240
+ class OpenMoeMLP(nn.Module):
241
+ def __init__(self, config: LlamaConfig):
242
+ super().__init__()
243
+ self.pretraining_tp = config.pretraining_tp
244
+ self.hidden_size = config.hidden_size
245
+ self.intermediate_size = config.intermediate_size
246
+ self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size * 2, bias=False)
247
+ self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
248
+ self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
249
+ self.hidden_act = config.hidden_act
250
+ self.act_fn = get_activation(self.hidden_act)
251
+ self.use_kernel = config.enable_kernel
252
+
253
+ def forward(self, x):
254
+ if self.pretraining_tp > 1:
255
+ slice = self.intermediate_size // self.pretraining_tp
256
+ gate_proj_slices = self.gate_proj.weight.split(slice, dim=0)
257
+ up_proj_slices = self.up_proj.weight.split(slice, dim=0)
258
+ down_proj_slices = self.down_proj.weight.split(slice, dim=1)
259
+
260
+ gate_proj = torch.cat([F.linear(x, gate_proj_slices[i]) for i in range(self.pretraining_tp)], dim=-1)
261
+ up_proj = torch.cat([F.linear(x, up_proj_slices[i]) for i in range(self.pretraining_tp)], dim=-1)
262
+
263
+ intermediate_states = (self.act_fn(gate_proj) * up_proj).split(slice, dim=2)
264
+ down_proj = [F.linear(intermediate_states[i], down_proj_slices[i]) for i in range(self.pretraining_tp)]
265
+ down_proj = sum(down_proj)
266
+ else:
267
+ if HAS_TRITON and self.use_kernel and self.hidden_act == "swiglu":
268
+ down_proj = self.down_proj(LlamaActCombine.apply(self.gate_proj(x), self.up_proj(x)))
269
+ else:
270
+ down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
271
+
272
+ return down_proj
273
+
274
+
275
+ def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
276
+ """
277
+ This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
278
+ num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
279
+ """
280
+ batch, num_key_value_heads, slen, head_dim = hidden_states.shape
281
+ if n_rep == 1:
282
+ return hidden_states
283
+ hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim)
284
+ return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
285
+
286
+
287
+ class OpenMoeAttention(nn.Module):
288
+ """Multi-headed attention from 'Attention Is All You Need' paper"""
289
+
290
+ def __init__(self, config: LlamaConfig):
291
+ super().__init__()
292
+ self.config = config
293
+ self.hidden_size = config.hidden_size
294
+ self.num_heads = config.num_attention_heads
295
+ self.head_dim = config.head_dim
296
+ self.num_key_value_heads = config.num_key_value_heads
297
+ self.num_key_value_groups = self.num_heads // self.num_key_value_heads
298
+ self.pretraining_tp = config.pretraining_tp
299
+ self.max_position_embeddings = config.max_position_embeddings
300
+
301
+ self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=False)
302
+ self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=False)
303
+ self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=False)
304
+ self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
305
+ self.generate_fixed_pos_embedding(self.head_dim, self.max_position_embeddings, 1.0, 1e4)
306
+ self.use_kernel = config.enable_kernel
307
+
308
+
309
+ def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
310
+ return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous()
311
+
312
+ def generate_fixed_pos_embedding(self, features, length, min_timescale=1.0, max_timescale=10000.0):
313
+ """Generate Sin/Cos for Rotary Embeddings.
314
+
315
+ Args:
316
+ features: an integer
317
+ length: an integer
318
+ min_timescale: an optional float
319
+ max_timescale: an optional float
320
+
321
+ Returns:
322
+ output_sin: a float32 Tensor with shape [length, features]
323
+ output_cos: a float32 Tensor with shape [length, features]
324
+ """
325
+ fraction = torch.arange(0, features, 2, dtype=torch.float32) / features
326
+ timescale = min_timescale * (max_timescale / min_timescale) ** fraction
327
+ rotational_frequency = 1.0 / timescale
328
+
329
+ sinusoid_inp = torch.einsum("i,j->ij", torch.arange(length, dtype=torch.float32), rotational_frequency)
330
+
331
+ sinusoid_inp = torch.cat([sinusoid_inp, sinusoid_inp], dim=-1)
332
+
333
+ self.register_buffer('sin', torch.sin(sinusoid_inp), persistent=False) # persistent=False --> buffer won't appear in the state_dict
334
+ self.register_buffer('cos', torch.cos(sinusoid_inp), persistent=False)
335
+
336
+ def forward(
337
+ self,
338
+ hidden_states: torch.Tensor,
339
+ attention_mask: Optional[torch.Tensor] = None,
340
+ position_ids: Optional[torch.LongTensor] = None,
341
+ past_key_value: Optional[Tuple[torch.Tensor]] = None,
342
+ output_attentions: bool = False,
343
+ use_cache: bool = False,
344
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
345
+ bsz, q_len, _ = hidden_states.size()
346
+
347
+ if self.pretraining_tp > 1:
348
+ key_value_slicing = (self.num_key_value_heads * self.head_dim) // self.pretraining_tp
349
+ query_slices = self.q_proj.weight.split((self.num_heads * self.head_dim) // self.pretraining_tp, dim=0)
350
+ key_slices = self.k_proj.weight.split(key_value_slicing, dim=0)
351
+ value_slices = self.v_proj.weight.split(key_value_slicing, dim=0)
352
+
353
+ query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.pretraining_tp)]
354
+ query_states = torch.cat(query_states, dim=-1)
355
+
356
+ key_states = [F.linear(hidden_states, key_slices[i]) for i in range(self.pretraining_tp)]
357
+ key_states = torch.cat(key_states, dim=-1)
358
+
359
+ value_states = [F.linear(hidden_states, value_slices[i]) for i in range(self.pretraining_tp)]
360
+ value_states = torch.cat(value_states, dim=-1)
361
+
362
+ else:
363
+ query_states = self.q_proj(hidden_states)
364
+ key_states = self.k_proj(hidden_states)
365
+ value_states = self.v_proj(hidden_states)
366
+
367
+ query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
368
+ key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
369
+ value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
370
+
371
+ kv_seq_len = key_states.shape[-2]
372
+ if past_key_value is not None:
373
+ kv_seq_len += past_key_value[0].shape[-2]
374
+ # cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
375
+ # query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
376
+ if past_key_value is not None:
377
+ # reuse k, v, self_attention
378
+ key_states = torch.cat([past_key_value[0], key_states], dim=2)
379
+ value_states = torch.cat([past_key_value[1], value_states], dim=2)
380
+
381
+ past_key_value = (key_states, value_states) if use_cache else None
382
+
383
+ query_states = query_states.transpose(1, 2)
384
+ key_states = key_states.transpose(1, 2)
385
+ max_length = max(query_states.shape[1], key_states.shape[1])
386
+ assert max_length <= self.sin.shape[0]
387
+ sin, cos = self.sin[:max_length], self.cos[:max_length]
388
+ # TODO: for inference, we can add emb kv into cache to avoid computation
389
+ query_states, key_states = apply_rotary_embedding(
390
+ query_states, key_states, cos, sin, decode=True if q_len == 1 else False, rotary_index=position_ids
391
+ )
392
+ query_states = query_states.transpose(1, 2)
393
+ key_states = key_states.transpose(1, 2)
394
+
395
+ # repeat k/v heads if n_kv_heads < n_heads
396
+ key_states = repeat_kv(key_states, self.num_key_value_groups)
397
+ value_states = repeat_kv(value_states, self.num_key_value_groups)
398
+
399
+ if HAS_FLASH_ATTN and self.use_kernel:
400
+ from flash_attn import flash_attn_func
401
+
402
+ query_states = query_states.transpose(1, 2)
403
+ key_states = key_states.transpose(1, 2)
404
+ value_states = value_states.transpose(1, 2)
405
+ attn_output = flash_attn_func(query_states, key_states, value_states, softmax_scale=1.0, causal=True)
406
+ attn_output = attn_output.transpose(1, 2).contiguous()
407
+ else:
408
+ attn_weights = torch.matmul(query_states, key_states.transpose(2, 3))
409
+
410
+ if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
411
+ raise ValueError(
412
+ f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
413
+ f" {attn_weights.size()}"
414
+ )
415
+
416
+ if attention_mask is not None:
417
+ if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
418
+ raise ValueError(
419
+ f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}"
420
+ )
421
+ if self.training:
422
+ attention_mask = attention_mask.clone().detach()
423
+ attention_mask[:, :, :, 0] = 0
424
+ attn_weights = attn_weights + attention_mask
425
+
426
+ # upcast attention to fp32
427
+ attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
428
+ attn_output = torch.matmul(attn_weights, value_states)
429
+
430
+ if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
431
+ raise ValueError(
432
+ f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
433
+ f" {attn_output.size()}"
434
+ )
435
+
436
+ attn_output = attn_output.transpose(1, 2).contiguous()
437
+ attn_output = attn_output.reshape(bsz, q_len, self.num_heads * self.head_dim)
438
+
439
+ if self.pretraining_tp > 1:
440
+ attn_output = attn_output.split(self.hidden_size // self.pretraining_tp, dim=2)
441
+ o_proj_slices = self.o_proj.weight.split(self.hidden_size // self.pretraining_tp, dim=1)
442
+ attn_output = sum([F.linear(attn_output[i], o_proj_slices[i]) for i in range(self.pretraining_tp)])
443
+ else:
444
+ attn_output = self.o_proj(attn_output)
445
+
446
+ if not output_attentions:
447
+ attn_weights = None
448
+
449
+ return attn_output, attn_weights, past_key_value
450
+
451
+
452
+ class OpenMoeDecoderLayer(nn.Module):
453
+ def __init__(self, config: LlamaConfig, moe: bool):
454
+ super().__init__()
455
+ self.hidden_size = config.hidden_size
456
+ self.moe = moe
457
+ self.self_attn = OpenMoeAttention(config=config)
458
+ # self.self_attn = LlamaAttention(config=config) # TODO: introduce LLaMA Positional Encoding
459
+ self.input_layernorm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
460
+ self.post_attention_layernorm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
461
+ if self.moe:
462
+ self.mlp = SparseMLP(
463
+ num_experts=config.num_experts,
464
+ hidden_size=config.hidden_size,
465
+ intermediate_size=config.intermediate_size,
466
+ router_top_k=config.router_topk,
467
+ router_capacity_factor_train=config.router_capacity_factor_train,
468
+ router_capacity_factor_eval=config.router_capacity_factor_eval,
469
+ router_min_capacity=config.router_min_capacity,
470
+ router_noisy_policy=config.router_noisy_policy,
471
+ router_drop_tks=config.router_drop_tks,
472
+ mlp_activation=config.hidden_act,
473
+ mlp_gated=config.mlp_gated,
474
+ enable_load_balance=config.enable_load_balance,
475
+ load_balance_tolerance=config.load_balance_tolerance,
476
+ load_balance_beam_width=config.load_balance_beam_width,
477
+ load_balance_group_swap_factor=config.load_balance_group_swap_factor,
478
+ enable_kernel=config.enable_kernel,
479
+ enable_comm_overlap=config.enable_comm_overlap,
480
+ )
481
+ self.pre_extra_mlp_layernorm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
482
+ self.extra_mlp = OpenMoeMLP(config)
483
+ else:
484
+ self.mlp = OpenMoeMLP(config)
485
+
486
+ def forward(
487
+ self,
488
+ hidden_states: torch.Tensor,
489
+ attention_mask: Optional[torch.Tensor] = None,
490
+ position_ids: Optional[torch.LongTensor] = None,
491
+ past_key_value: Optional[Tuple[torch.Tensor]] = None,
492
+ output_attentions: Optional[bool] = False,
493
+ use_cache: Optional[bool] = False,
494
+ ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
495
+ """
496
+ Args:
497
+ hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
498
+ attention_mask (`torch.FloatTensor`, *optional*): attention mask of size
499
+ `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
500
+ output_attentions (`bool`, *optional*):
501
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under
502
+ returned tensors for more detail.
503
+ use_cache (`bool`, *optional*):
504
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
505
+ (see `past_key_values`).
506
+ past_key_value (`Tuple(torch.FloatTensor)`, *optional*): cached past key and value projection states
507
+ """
508
+
509
+ residual = hidden_states
510
+
511
+ hidden_states = self.input_layernorm(hidden_states)
512
+
513
+ # Self Attention
514
+ hidden_states, self_attn_weights, present_key_value = self.self_attn(
515
+ hidden_states=hidden_states,
516
+ attention_mask=attention_mask,
517
+ position_ids=position_ids,
518
+ past_key_value=past_key_value,
519
+ output_attentions=output_attentions,
520
+ use_cache=use_cache,
521
+ )
522
+ hidden_states = residual + hidden_states
523
+
524
+ # Fully Connected
525
+ residual = hidden_states
526
+ hidden_states = self.post_attention_layernorm(hidden_states)
527
+ hidden_states = self.mlp(hidden_states)
528
+ hidden_states = residual + hidden_states
529
+
530
+ if self.moe:
531
+ residual = hidden_states
532
+ hidden_states = self.pre_extra_mlp_layernorm(hidden_states)
533
+ hidden_states = self.extra_mlp(hidden_states)
534
+ hidden_states = residual + hidden_states
535
+
536
+ outputs = (hidden_states,)
537
+
538
+ if output_attentions:
539
+ outputs += (self_attn_weights,)
540
+
541
+ if use_cache:
542
+ outputs += (present_key_value,)
543
+
544
+ return outputs
545
+
546
+
547
+ LLAMA_START_DOCSTRING = r"""
548
+ This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
549
+ library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
550
+ etc.)
551
+
552
+ This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
553
+ Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
554
+ and behavior.
555
+
556
+ Parameters:
557
+ config ([`LlamaConfig`]):
558
+ Model configuration class with all the parameters of the model. Initializing with a config file does not
559
+ load the weights associated with the model, only the configuration. Check out the
560
+ [`~PreTrainedModel.from_pretrained`] method to load the model weights.
561
+ """
562
+
563
+
564
+ @add_start_docstrings(
565
+ "The bare LLaMA Model outputting raw hidden-states without any specific head on top.",
566
+ LLAMA_START_DOCSTRING,
567
+ )
568
+ class OpenMoePreTrainedModel(PreTrainedModel):
569
+ config_class = LlamaConfig
570
+ base_model_prefix = "model"
571
+ supports_gradient_checkpointing = True
572
+ _no_split_modules = ["OpenMoeDecoderLayer"]
573
+ _skip_keys_device_placement = "past_key_values"
574
+
575
+ def _init_weights(self, module):
576
+ std = self.config.initializer_range
577
+ if isinstance(module, nn.Linear):
578
+ module.weight.data.normal_(mean=0.0, std=std)
579
+ if module.bias is not None:
580
+ module.bias.data.zero_()
581
+ elif isinstance(module, nn.Embedding):
582
+ module.weight.data.normal_(mean=0.0, std=std)
583
+ if module.padding_idx is not None:
584
+ module.weight.data[module.padding_idx].zero_()
585
+
586
+ def _set_gradient_checkpointing(self, module, value=False):
587
+ if isinstance(module, OpenMoeModel):
588
+ module.gradient_checkpointing = value
589
+
590
+
591
+ LLAMA_INPUTS_DOCSTRING = r"""
592
+ Args:
593
+ input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
594
+ Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
595
+ it.
596
+
597
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
598
+ [`PreTrainedTokenizer.__call__`] for details.
599
+
600
+ [What are input IDs?](../glossary#input-ids)
601
+ attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
602
+ Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
603
+
604
+ - 1 for tokens that are **not masked**,
605
+ - 0 for tokens that are **masked**.
606
+
607
+ [What are attention masks?](../glossary#attention-mask)
608
+
609
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
610
+ [`PreTrainedTokenizer.__call__`] for details.
611
+
612
+ If `past_key_values` is used, optionally only the last `decoder_input_ids` have to be input (see
613
+ `past_key_values`).
614
+
615
+ If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
616
+ and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more
617
+ information on the default strategy.
618
+
619
+ - 1 indicates the head is **not masked**,
620
+ - 0 indicates the head is **masked**.
621
+ position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
622
+ Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
623
+ config.n_positions - 1]`.
624
+
625
+ [What are position IDs?](../glossary#position-ids)
626
+ past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
627
+ Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape
628
+ `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of shape
629
+ `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`.
630
+
631
+ Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
632
+ blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
633
+
634
+ If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
635
+ don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
636
+ `decoder_input_ids` of shape `(batch_size, sequence_length)`.
637
+ inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
638
+ Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
639
+ is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
640
+ model's internal embedding lookup matrix.
641
+ use_cache (`bool`, *optional*):
642
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
643
+ `past_key_values`).
644
+ output_attentions (`bool`, *optional*):
645
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
646
+ tensors for more detail.
647
+ output_hidden_states (`bool`, *optional*):
648
+ Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
649
+ more detail.
650
+ return_dict (`bool`, *optional*):
651
+ Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
652
+ """
653
+
654
+
655
+ @add_start_docstrings(
656
+ "The bare LLaMA Model outputting raw hidden-states without any specific head on top.",
657
+ LLAMA_START_DOCSTRING,
658
+ )
659
+ class OpenMoeModel(OpenMoePreTrainedModel):
660
+ """
661
+ Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`LlamaDecoderLayer`]
662
+
663
+ Args:
664
+ config: LlamaConfig
665
+ """
666
+
667
+ def __init__(self, config: LlamaConfig):
668
+ super().__init__(config)
669
+ self.padding_idx = config.pad_token_id
670
+ self.vocab_size = config.vocab_size
671
+
672
+ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
673
+ self.layers = nn.ModuleList(
674
+ [
675
+ OpenMoeDecoderLayer(config, moe=True if (i + 1) % config.moe_layer_interval == 0 else False)
676
+ for i in range(config.num_hidden_layers)
677
+ ]
678
+ )
679
+ self.norm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
680
+
681
+ self.gradient_checkpointing = False
682
+ # Initialize weights and apply final processing
683
+ self.post_init()
684
+
685
+ def get_input_embeddings(self):
686
+ return self.embed_tokens
687
+
688
+ def set_input_embeddings(self, value):
689
+ self.embed_tokens = value
690
+
691
+ # Copied from transformers.models.bart.modeling_bart.BartDecoder._prepare_decoder_attention_mask
692
+ def _prepare_decoder_attention_mask(self, attention_mask, input_shape, inputs_embeds, past_key_values_length):
693
+ # create causal mask
694
+ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
695
+ combined_attention_mask = None
696
+ if input_shape[-1] > 1:
697
+ combined_attention_mask = _make_causal_mask(
698
+ input_shape,
699
+ inputs_embeds.dtype,
700
+ device=inputs_embeds.device,
701
+ past_key_values_length=past_key_values_length,
702
+ )
703
+
704
+ if attention_mask is not None:
705
+ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
706
+ expanded_attn_mask = _expand_mask(attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]).to(
707
+ inputs_embeds.device
708
+ )
709
+ combined_attention_mask = (
710
+ expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask + combined_attention_mask
711
+ )
712
+
713
+ return combined_attention_mask
714
+
715
+ @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)
716
+ def forward(
717
+ self,
718
+ input_ids: torch.LongTensor = None,
719
+ attention_mask: Optional[torch.Tensor] = None,
720
+ position_ids: Optional[torch.LongTensor] = None,
721
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
722
+ inputs_embeds: Optional[torch.FloatTensor] = None,
723
+ use_cache: Optional[bool] = None,
724
+ output_attentions: Optional[bool] = None,
725
+ output_hidden_states: Optional[bool] = None,
726
+ return_dict: Optional[bool] = None,
727
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
728
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
729
+ output_hidden_states = (
730
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
731
+ )
732
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
733
+
734
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
735
+
736
+ # retrieve input_ids and inputs_embeds
737
+ if input_ids is not None and inputs_embeds is not None:
738
+ raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
739
+ elif input_ids is not None:
740
+ batch_size, seq_length = input_ids.shape
741
+ elif inputs_embeds is not None:
742
+ batch_size, seq_length, _ = inputs_embeds.shape
743
+ else:
744
+ raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
745
+
746
+ seq_length_with_past = seq_length
747
+ past_key_values_length = 0
748
+
749
+ if past_key_values is not None:
750
+ past_key_values_length = past_key_values[0][0].shape[2]
751
+ seq_length_with_past = seq_length_with_past + past_key_values_length
752
+
753
+ if position_ids is None:
754
+ device = input_ids.device if input_ids is not None else inputs_embeds.device
755
+ position_ids = torch.arange(
756
+ past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=device
757
+ )
758
+ position_ids = position_ids.unsqueeze(0).view(-1, seq_length)
759
+ else:
760
+ position_ids = position_ids.view(-1, seq_length).long()
761
+
762
+ if inputs_embeds is None:
763
+ inputs_embeds = self.embed_tokens(input_ids)
764
+ # embed positions
765
+ if attention_mask is None:
766
+ attention_mask = torch.ones(
767
+ (batch_size, seq_length_with_past), dtype=torch.bool, device=inputs_embeds.device
768
+ )
769
+ attention_mask = self._prepare_decoder_attention_mask(
770
+ attention_mask, (batch_size, seq_length), inputs_embeds, past_key_values_length
771
+ )
772
+
773
+ hidden_states = inputs_embeds
774
+
775
+ if self.gradient_checkpointing and self.training:
776
+ if use_cache:
777
+ logger.warning_once(
778
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
779
+ )
780
+ use_cache = False
781
+
782
+ # decoder layers
783
+ all_hidden_states = () if output_hidden_states else None
784
+ all_self_attns = () if output_attentions else None
785
+ next_decoder_cache = () if use_cache else None
786
+
787
+ for idx, decoder_layer in enumerate(self.layers):
788
+ if output_hidden_states:
789
+ all_hidden_states += (hidden_states,)
790
+
791
+ past_key_value = past_key_values[idx] if past_key_values is not None else None
792
+
793
+ if self.gradient_checkpointing and self.training:
794
+
795
+ def create_custom_forward(module):
796
+ def custom_forward(*inputs):
797
+ # None for past_key_value
798
+ return module(*inputs, output_attentions, None)
799
+
800
+ return custom_forward
801
+
802
+ layer_outputs = torch.utils.checkpoint.checkpoint(
803
+ create_custom_forward(decoder_layer),
804
+ hidden_states,
805
+ attention_mask,
806
+ position_ids,
807
+ None,
808
+ )
809
+ else:
810
+ layer_outputs = decoder_layer(
811
+ hidden_states,
812
+ attention_mask=attention_mask,
813
+ position_ids=position_ids,
814
+ past_key_value=past_key_value,
815
+ output_attentions=output_attentions,
816
+ use_cache=use_cache,
817
+ )
818
+
819
+ hidden_states = layer_outputs[0]
820
+
821
+ if use_cache:
822
+ next_decoder_cache += (layer_outputs[2 if output_attentions else 1],)
823
+
824
+ if output_attentions:
825
+ all_self_attns += (layer_outputs[1],)
826
+
827
+ hidden_states = self.norm(hidden_states)
828
+
829
+ # add hidden states from the last decoder layer
830
+ if output_hidden_states:
831
+ all_hidden_states += (hidden_states,)
832
+
833
+ next_cache = next_decoder_cache if use_cache else None
834
+ if not return_dict:
835
+ return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
836
+ return BaseModelOutputWithPast(
837
+ last_hidden_state=hidden_states,
838
+ past_key_values=next_cache,
839
+ hidden_states=all_hidden_states,
840
+ attentions=all_self_attns,
841
+ )
842
+
843
+
844
+ class OpenMoeForCausalLM(OpenMoePreTrainedModel):
845
+ # _tied_weights_keys = ["lm_head.weight"]
846
+
847
+ def __init__(self, config):
848
+ super().__init__(config)
849
+ self.model = OpenMoeModel(config)
850
+ self.pretraining_tp = config.pretraining_tp
851
+ self.vocab_size = config.vocab_size
852
+ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
853
+
854
+ # Initialize weights and apply final processing
855
+ self.post_init()
856
+
857
+ def get_input_embeddings(self):
858
+ return self.model.embed_tokens
859
+
860
+ def set_input_embeddings(self, value):
861
+ self.model.embed_tokens = value
862
+
863
+ def get_output_embeddings(self):
864
+ return self.lm_head
865
+
866
+ def set_output_embeddings(self, new_embeddings):
867
+ self.lm_head = new_embeddings
868
+
869
+ def set_decoder(self, decoder):
870
+ self.model = decoder
871
+
872
+ def get_decoder(self):
873
+ return self.model
874
+
875
+ @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)
876
+ @replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC)
877
+ def forward(
878
+ self,
879
+ input_ids: torch.LongTensor = None,
880
+ attention_mask: Optional[torch.Tensor] = None,
881
+ position_ids: Optional[torch.LongTensor] = None,
882
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
883
+ inputs_embeds: Optional[torch.FloatTensor] = None,
884
+ labels: Optional[torch.LongTensor] = None,
885
+ use_cache: Optional[bool] = None,
886
+ output_attentions: Optional[bool] = None,
887
+ output_hidden_states: Optional[bool] = None,
888
+ return_dict: Optional[bool] = None,
889
+ chunk_head: Optional[bool] = True,
890
+ ) -> Union[Tuple, CausalLMOutputWithPast]:
891
+ r"""
892
+ Args:
893
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
894
+ Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
895
+ config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
896
+ (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
897
+
898
+ Returns:
899
+
900
+ Example:
901
+
902
+ ```python
903
+ >>> from transformers import AutoTokenizer, LlamaForCausalLM
904
+
905
+ >>> model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
906
+ >>> tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)
907
+
908
+ >>> prompt = "Hey, are you conscious? Can you talk to me?"
909
+ >>> inputs = tokenizer(prompt, return_tensors="pt")
910
+
911
+ >>> # Generate
912
+ >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
913
+ >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
914
+ "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
915
+ ```"""
916
+ # reset moe loss
917
+ MOE_MANAGER.reset_loss()
918
+
919
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
920
+ output_hidden_states = (
921
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
922
+ )
923
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
924
+
925
+ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
926
+ outputs = self.model(
927
+ input_ids=input_ids,
928
+ attention_mask=attention_mask,
929
+ position_ids=position_ids,
930
+ past_key_values=past_key_values,
931
+ inputs_embeds=inputs_embeds,
932
+ use_cache=use_cache,
933
+ output_attentions=output_attentions,
934
+ output_hidden_states=output_hidden_states,
935
+ return_dict=return_dict,
936
+ )
937
+
938
+ hidden_states = outputs[0]
939
+ if self.pretraining_tp > 1:
940
+ lm_head_slices = self.lm_head.weight.split(self.vocab_size // self.pretraining_tp, dim=0)
941
+ logits = [F.linear(hidden_states, lm_head_slices[i]) for i in range(self.pretraining_tp)]
942
+ logits = torch.cat(logits, dim=-1)
943
+
944
+ loss = None
945
+ # if no training, just do forward
946
+ if labels is None:
947
+ logits = self.lm_head(hidden_states)
948
+ logits = logits.float()
949
+ # the vocab size for openmoe is 30w+
950
+ # which causes great activation memory in training, up to 20G for one sequence
951
+ # so we use chunk and checkpoint to reduce memory
952
+ else:
953
+ if chunk_head == True:
954
+
955
+ def create_custom_forward(module):
956
+ def custom_forward(*inputs):
957
+ logits = module(inputs[0])
958
+ logits = logits.float()
959
+ # Shift so that tokens < n predict n
960
+ shift_logits = logits[..., :-1, :].contiguous().float()
961
+ shift_labels = inputs[1][..., 1:].contiguous()
962
+ # Flatten the tokens
963
+ loss = self._calculate_loss(shift_logits, shift_labels)
964
+ return loss
965
+
966
+ return custom_forward
967
+
968
+ aux_loss, z_loss = self._calculate_router_loss()
969
+ loss = aux_loss + z_loss
970
+ for batch_idx in range(hidden_states.shape[0]):
971
+ loss = loss + torch.utils.checkpoint.checkpoint(
972
+ create_custom_forward(self.lm_head),
973
+ hidden_states[batch_idx : batch_idx + 1, :],
974
+ labels[batch_idx : batch_idx + 1, :],
975
+ )
976
+ logits = None
977
+ else:
978
+ logits = self.lm_head(hidden_states)
979
+ logits = logits.float()
980
+ # Shift so that tokens < n predict n
981
+ shift_logits = logits[..., :-1, :].contiguous()
982
+ shift_labels = labels[..., 1:].contiguous()
983
+ # Flatten the tokens
984
+ aux_loss, z_loss = self._calculate_router_loss()
985
+ loss = aux_loss + z_loss
986
+ loss = loss + self._calculate_loss(shift_logits, shift_labels)
987
+
988
+ if not return_dict:
989
+ output = (logits,) + outputs[1:]
990
+ return (loss,) + output if loss is not None else output
991
+
992
+ return CausalLMOutputWithPast(
993
+ loss=loss,
994
+ logits=logits,
995
+ past_key_values=outputs.past_key_values,
996
+ hidden_states=outputs.hidden_states,
997
+ attentions=outputs.attentions,
998
+ )
999
+
1000
+ def prepare_inputs_for_generation(
1001
+ self, input_ids, past_key_values=None, attention_mask=None, inputs_embeds=None, **kwargs
1002
+ ):
1003
+ if past_key_values:
1004
+ input_ids = input_ids[:, -1:]
1005
+
1006
+ position_ids = kwargs.get("position_ids", None)
1007
+ if attention_mask is not None and position_ids is None:
1008
+ # create position_ids on the fly for batch generation
1009
+ position_ids = attention_mask.long().cumsum(-1) - 1
1010
+ position_ids.masked_fill_(attention_mask == 0, 1)
1011
+ if past_key_values:
1012
+ position_ids = position_ids[:, -1].unsqueeze(-1)
1013
+
1014
+ # if `inputs_embeds` are passed, we only want to use them in the 1st generation step
1015
+ if inputs_embeds is not None and past_key_values is None:
1016
+ model_inputs = {"inputs_embeds": inputs_embeds}
1017
+ else:
1018
+ model_inputs = {"input_ids": input_ids}
1019
+
1020
+ model_inputs.update(
1021
+ {
1022
+ "position_ids": position_ids,
1023
+ "past_key_values": past_key_values,
1024
+ "use_cache": kwargs.get("use_cache"),
1025
+ "attention_mask": attention_mask,
1026
+ }
1027
+ )
1028
+ return model_inputs
1029
+
1030
+ @staticmethod
1031
+ def _reorder_cache(past_key_values, beam_idx):
1032
+ reordered_past = ()
1033
+ for layer_past in past_key_values:
1034
+ reordered_past += (
1035
+ tuple(past_state.index_select(0, beam_idx.to(past_state.device)) for past_state in layer_past),
1036
+ )
1037
+ return reordered_past
1038
+
1039
+ def _calculate_router_loss(self, aux_loss: list = None, z_loss: list = None):
1040
+ if aux_loss is None or z_loss is None:
1041
+ aux_loss, z_loss = MOE_MANAGER.get_loss()
1042
+ assert len(aux_loss) == len(z_loss) == self.config.num_hidden_layers // self.config.moe_layer_interval
1043
+ aux_loss = self.config.router_aux_loss_factor * sum(aux_loss) / len(aux_loss)
1044
+ z_loss = self.config.router_z_loss_factor * sum(z_loss) / len(z_loss)
1045
+ return aux_loss, z_loss
1046
+
1047
+ def _calculate_loss(self, logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
1048
+ """Compute cross entropy and entropy for log probs and targets.
1049
+
1050
+ Args:
1051
+ logits: [batch, length, num_classes] float array.
1052
+ targets: categorical targets [batch, length] int array.
1053
+
1054
+ Returns:
1055
+ Tuple of scalar loss.
1056
+ """
1057
+ if len(logits.shape) != len(targets.shape) + 1:
1058
+ raise ValueError(
1059
+ "Incorrect shapes. Got shape %s logits and %s targets" % (str(logits.shape), str(targets.shape))
1060
+ )
1061
+ vocab_size = logits.shape[-1]
1062
+ confidence = 1.0 - self.config.label_smoothing
1063
+ low_confidence = (1.0 - confidence) / (vocab_size - 1)
1064
+ normalizing_constant = -(
1065
+ confidence * math.log(confidence) + (vocab_size - 1) * low_confidence * math.log(low_confidence + 1e-20)
1066
+ )
1067
+
1068
+ # one hot
1069
+ soft_targets = targets[..., None] == torch.arange(vocab_size, device=targets.device).reshape(
1070
+ (1,) * len(targets.shape) + (-1,)
1071
+ )
1072
+ soft_targets = torch.where(
1073
+ soft_targets, torch.full_like(soft_targets, confidence), torch.full_like(soft_targets, low_confidence)
1074
+ )
1075
+ soft_targets = soft_targets.to(torch.float32)
1076
+
1077
+ # cross entropy
1078
+ total_loss = ZLossCrossEntropy.apply(logits, soft_targets, self.config.z_loss_factor)
1079
+ total_loss = total_loss - normalizing_constant
1080
+ total_loss = torch.mean(torch.sum(total_loss, dim=-1), dim=0)
1081
+ return total_loss
1082
+
1083
+
1084
+ class ZLossCrossEntropy(torch.autograd.Function):
1085
+ """Computes cross entropy loss with stable custom gradient.
1086
+
1087
+ Computes a stabilized-gradient version of:
1088
+ -jnp.sum(targets * nn.log_softmax(logits), axis=-1)
1089
+
1090
+ If z_loss > 0, then an auxiliary loss equal to z_loss*log(z)^2
1091
+ will be added to the cross entropy loss (z = softmax normalization constant).
1092
+ The two uses of z_loss are:
1093
+ 1. To keep the logits from drifting too far from zero, which can cause
1094
+ unacceptable roundoff errors in bfloat16.
1095
+ 2. To encourage the logits to be normalized log-probabilities.
1096
+
1097
+ Args:
1098
+ logits: [batch, length, num_classes] float array.
1099
+ targets: categorical one-hot targets [batch, length, num_classes] float
1100
+ array.
1101
+ z_loss: coefficient for auxilliary z-loss loss term.
1102
+
1103
+ Returns:
1104
+ tuple with the total loss and the z_loss, both
1105
+ float arrays with shape [batch, length].
1106
+ """
1107
+
1108
+ @staticmethod
1109
+ def forward(ctx, logits, targets, z_loss):
1110
+ max_logit = torch.max(logits, dim=-1, keepdim=True)[0]
1111
+ shifted = logits - max_logit
1112
+ exp_shifted = torch.exp(shifted)
1113
+ sum_exp = torch.sum(exp_shifted, axis=-1, keepdims=True)
1114
+ sum_exp_log = torch.log(sum_exp)
1115
+ log_softmax = shifted - sum_exp_log
1116
+ loss = -torch.sum(targets * log_softmax, axis=-1)
1117
+ # Add auxilliary z-loss term.
1118
+ log_z = torch.squeeze(sum_exp_log + max_logit, axis=-1)
1119
+ total_z_loss = z_loss * torch.square(log_z)
1120
+ loss += total_z_loss
1121
+ ctx.z_loss = z_loss
1122
+ ctx.save_for_backward(logits, targets, exp_shifted, sum_exp, log_softmax, log_z)
1123
+ return loss
1124
+
1125
+ @staticmethod
1126
+ def backward(ctx, *grad_outputs):
1127
+ assert len(grad_outputs) == 1
1128
+ g = grad_outputs[0]
1129
+ z_loss = ctx.z_loss
1130
+ logits, targets, exp_shifted, sum_exp, log_softmax, log_z = ctx.saved_tensors
1131
+ # z-loss term adds the (2 * z_loss * log_z) factor.
1132
+ deriv = (1 + 2 * z_loss * log_z).unsqueeze(-1) * exp_shifted / sum_exp - targets
1133
+ g_logits = g.unsqueeze(-1) * deriv
1134
+ g_targets = -g.unsqueeze(-1) * log_softmax
1135
+
1136
+ return (
1137
+ g_logits.to(logits.dtype),
1138
+ g_targets.to(targets.dtype),
1139
+ None,
1140
+ )
special_tokens_map.json ADDED
@@ -0,0 +1,308 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>",
103
+ "<extra_id_100>",
104
+ "<extra_id_101>",
105
+ "<extra_id_102>",
106
+ "<extra_id_103>",
107
+ "<extra_id_104>",
108
+ "<extra_id_105>",
109
+ "<extra_id_106>",
110
+ "<extra_id_107>",
111
+ "<extra_id_108>",
112
+ "<extra_id_109>",
113
+ "<extra_id_110>",
114
+ "<extra_id_111>",
115
+ "<extra_id_112>",
116
+ "<extra_id_113>",
117
+ "<extra_id_114>",
118
+ "<extra_id_115>",
119
+ "<extra_id_116>",
120
+ "<extra_id_117>",
121
+ "<extra_id_118>",
122
+ "<extra_id_119>",
123
+ "<extra_id_120>",
124
+ "<extra_id_121>",
125
+ "<extra_id_122>",
126
+ "<extra_id_123>",
127
+ "<extra_id_124>",
128
+ "<extra_id_125>",
129
+ "<extra_id_126>",
130
+ "<extra_id_127>",
131
+ "<extra_id_128>",
132
+ "<extra_id_129>",
133
+ "<extra_id_130>",
134
+ "<extra_id_131>",
135
+ "<extra_id_132>",
136
+ "<extra_id_133>",
137
+ "<extra_id_134>",
138
+ "<extra_id_135>",
139
+ "<extra_id_136>",
140
+ "<extra_id_137>",
141
+ "<extra_id_138>",
142
+ "<extra_id_139>",
143
+ "<extra_id_140>",
144
+ "<extra_id_141>",
145
+ "<extra_id_142>",
146
+ "<extra_id_143>",
147
+ "<extra_id_144>",
148
+ "<extra_id_145>",
149
+ "<extra_id_146>",
150
+ "<extra_id_147>",
151
+ "<extra_id_148>",
152
+ "<extra_id_149>",
153
+ "<extra_id_150>",
154
+ "<extra_id_151>",
155
+ "<extra_id_152>",
156
+ "<extra_id_153>",
157
+ "<extra_id_154>",
158
+ "<extra_id_155>",
159
+ "<extra_id_156>",
160
+ "<extra_id_157>",
161
+ "<extra_id_158>",
162
+ "<extra_id_159>",
163
+ "<extra_id_160>",
164
+ "<extra_id_161>",
165
+ "<extra_id_162>",
166
+ "<extra_id_163>",
167
+ "<extra_id_164>",
168
+ "<extra_id_165>",
169
+ "<extra_id_166>",
170
+ "<extra_id_167>",
171
+ "<extra_id_168>",
172
+ "<extra_id_169>",
173
+ "<extra_id_170>",
174
+ "<extra_id_171>",
175
+ "<extra_id_172>",
176
+ "<extra_id_173>",
177
+ "<extra_id_174>",
178
+ "<extra_id_175>",
179
+ "<extra_id_176>",
180
+ "<extra_id_177>",
181
+ "<extra_id_178>",
182
+ "<extra_id_179>",
183
+ "<extra_id_180>",
184
+ "<extra_id_181>",
185
+ "<extra_id_182>",
186
+ "<extra_id_183>",
187
+ "<extra_id_184>",
188
+ "<extra_id_185>",
189
+ "<extra_id_186>",
190
+ "<extra_id_187>",
191
+ "<extra_id_188>",
192
+ "<extra_id_189>",
193
+ "<extra_id_190>",
194
+ "<extra_id_191>",
195
+ "<extra_id_192>",
196
+ "<extra_id_193>",
197
+ "<extra_id_194>",
198
+ "<extra_id_195>",
199
+ "<extra_id_196>",
200
+ "<extra_id_197>",
201
+ "<extra_id_198>",
202
+ "<extra_id_199>",
203
+ "<extra_id_200>",
204
+ "<extra_id_201>",
205
+ "<extra_id_202>",
206
+ "<extra_id_203>",
207
+ "<extra_id_204>",
208
+ "<extra_id_205>",
209
+ "<extra_id_206>",
210
+ "<extra_id_207>",
211
+ "<extra_id_208>",
212
+ "<extra_id_209>",
213
+ "<extra_id_210>",
214
+ "<extra_id_211>",
215
+ "<extra_id_212>",
216
+ "<extra_id_213>",
217
+ "<extra_id_214>",
218
+ "<extra_id_215>",
219
+ "<extra_id_216>",
220
+ "<extra_id_217>",
221
+ "<extra_id_218>",
222
+ "<extra_id_219>",
223
+ "<extra_id_220>",
224
+ "<extra_id_221>",
225
+ "<extra_id_222>",
226
+ "<extra_id_223>",
227
+ "<extra_id_224>",
228
+ "<extra_id_225>",
229
+ "<extra_id_226>",
230
+ "<extra_id_227>",
231
+ "<extra_id_228>",
232
+ "<extra_id_229>",
233
+ "<extra_id_230>",
234
+ "<extra_id_231>",
235
+ "<extra_id_232>",
236
+ "<extra_id_233>",
237
+ "<extra_id_234>",
238
+ "<extra_id_235>",
239
+ "<extra_id_236>",
240
+ "<extra_id_237>",
241
+ "<extra_id_238>",
242
+ "<extra_id_239>",
243
+ "<extra_id_240>",
244
+ "<extra_id_241>",
245
+ "<extra_id_242>",
246
+ "<extra_id_243>",
247
+ "<extra_id_244>",
248
+ "<extra_id_245>",
249
+ "<extra_id_246>",
250
+ "<extra_id_247>",
251
+ "<extra_id_248>",
252
+ "<extra_id_249>",
253
+ "<extra_id_250>",
254
+ "<extra_id_251>",
255
+ "<extra_id_252>",
256
+ "<extra_id_253>",
257
+ "<extra_id_254>",
258
+ "<extra_id_255>",
259
+ "<extra_id_256>",
260
+ "<extra_id_257>",
261
+ "<extra_id_258>",
262
+ "<extra_id_259>",
263
+ "<extra_id_260>",
264
+ "<extra_id_261>",
265
+ "<extra_id_262>",
266
+ "<extra_id_263>",
267
+ "<extra_id_264>",
268
+ "<extra_id_265>",
269
+ "<extra_id_266>",
270
+ "<extra_id_267>",
271
+ "<extra_id_268>",
272
+ "<extra_id_269>",
273
+ "<extra_id_270>",
274
+ "<extra_id_271>",
275
+ "<extra_id_272>",
276
+ "<extra_id_273>",
277
+ "<extra_id_274>",
278
+ "<extra_id_275>",
279
+ "<extra_id_276>",
280
+ "<extra_id_277>",
281
+ "<extra_id_278>",
282
+ "<extra_id_279>",
283
+ "<extra_id_280>",
284
+ "<extra_id_281>",
285
+ "<extra_id_282>",
286
+ "<extra_id_283>",
287
+ "<extra_id_284>",
288
+ "<extra_id_285>",
289
+ "<extra_id_286>",
290
+ "<extra_id_287>",
291
+ "<extra_id_288>",
292
+ "<extra_id_289>",
293
+ "<extra_id_290>",
294
+ "<extra_id_291>",
295
+ "<extra_id_292>",
296
+ "<extra_id_293>",
297
+ "<extra_id_294>",
298
+ "<extra_id_295>",
299
+ "<extra_id_296>",
300
+ "<extra_id_297>",
301
+ "<extra_id_298>",
302
+ "<extra_id_299>"
303
+ ],
304
+ "bos_token": "<s>",
305
+ "eos_token": "</s>",
306
+ "pad_token": "<pad>",
307
+ "unk_token": "<unk>"
308
+ }
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3909a67b780650b35cf529ac782ad2b6b26e6d1f849d3fbb6a872905f452458
3
+ size 4548313
tokenization_openmoe.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import T5Tokenizer
2
+ from typing import List, Optional, Tuple, Union
3
+
4
+ class OpenMoeTokenizer(T5Tokenizer):
5
+ def __init__(self, *args, **kwargs):
6
+ super().__init__(*args, **kwargs)
7
+ self.padding_side = 'left'
8
+ self.add_bos_token = True
9
+ self.add_eos_token = False
10
+
11
+ def build_inputs_with_special_tokens(
12
+ self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
13
+ ) -> List[int]:
14
+ if self.add_eos_token:
15
+ token_ids_0 = self._add_eos_if_not_present(token_ids_0)
16
+ if self.add_bos_token:
17
+ token_ids_0 = [self.pad_token_id] + token_ids_0
18
+ if token_ids_1 is None:
19
+ return token_ids_0
20
+ else:
21
+ token_ids_1 = self._add_eos_if_not_present(token_ids_1)
22
+ return token_ids_0 + token_ids_1
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af904105ce1071b1202bba0059a841f4a7b85b48b6ec179c4948e3483476e0dd
3
+ size 16853013
tokenizer_config.json ADDED
@@ -0,0 +1,2757 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<pad>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "</s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": false
18
+ },
19
+ "2": {
20
+ "content": "<s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "256000": {
36
+ "content": "<extra_id_299>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "256001": {
44
+ "content": "<extra_id_298>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "256002": {
52
+ "content": "<extra_id_297>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "256003": {
60
+ "content": "<extra_id_296>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "256004": {
68
+ "content": "<extra_id_295>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "256005": {
76
+ "content": "<extra_id_294>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "256006": {
84
+ "content": "<extra_id_293>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "256007": {
92
+ "content": "<extra_id_292>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "256008": {
100
+ "content": "<extra_id_291>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "256009": {
108
+ "content": "<extra_id_290>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "256010": {
116
+ "content": "<extra_id_289>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "256011": {
124
+ "content": "<extra_id_288>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "256012": {
132
+ "content": "<extra_id_287>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "256013": {
140
+ "content": "<extra_id_286>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "256014": {
148
+ "content": "<extra_id_285>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "256015": {
156
+ "content": "<extra_id_284>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "256016": {
164
+ "content": "<extra_id_283>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "256017": {
172
+ "content": "<extra_id_282>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "256018": {
180
+ "content": "<extra_id_281>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "256019": {
188
+ "content": "<extra_id_280>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "256020": {
196
+ "content": "<extra_id_279>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "256021": {
204
+ "content": "<extra_id_278>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "256022": {
212
+ "content": "<extra_id_277>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "256023": {
220
+ "content": "<extra_id_276>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": false
226
+ },
227
+ "256024": {
228
+ "content": "<extra_id_275>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": false
234
+ },
235
+ "256025": {
236
+ "content": "<extra_id_274>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": false
242
+ },
243
+ "256026": {
244
+ "content": "<extra_id_273>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": false
250
+ },
251
+ "256027": {
252
+ "content": "<extra_id_272>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": false
258
+ },
259
+ "256028": {
260
+ "content": "<extra_id_271>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": false
266
+ },
267
+ "256029": {
268
+ "content": "<extra_id_270>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "256030": {
276
+ "content": "<extra_id_269>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "256031": {
284
+ "content": "<extra_id_268>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "256032": {
292
+ "content": "<extra_id_267>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "256033": {
300
+ "content": "<extra_id_266>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "256034": {
308
+ "content": "<extra_id_265>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "256035": {
316
+ "content": "<extra_id_264>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "256036": {
324
+ "content": "<extra_id_263>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "256037": {
332
+ "content": "<extra_id_262>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "256038": {
340
+ "content": "<extra_id_261>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "256039": {
348
+ "content": "<extra_id_260>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "256040": {
356
+ "content": "<extra_id_259>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "256041": {
364
+ "content": "<extra_id_258>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "256042": {
372
+ "content": "<extra_id_257>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "256043": {
380
+ "content": "<extra_id_256>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "256044": {
388
+ "content": "<extra_id_255>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "256045": {
396
+ "content": "<extra_id_254>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "256046": {
404
+ "content": "<extra_id_253>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "256047": {
412
+ "content": "<extra_id_252>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "256048": {
420
+ "content": "<extra_id_251>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "256049": {
428
+ "content": "<extra_id_250>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "256050": {
436
+ "content": "<extra_id_249>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "256051": {
444
+ "content": "<extra_id_248>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "256052": {
452
+ "content": "<extra_id_247>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "256053": {
460
+ "content": "<extra_id_246>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "256054": {
468
+ "content": "<extra_id_245>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "256055": {
476
+ "content": "<extra_id_244>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "256056": {
484
+ "content": "<extra_id_243>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "256057": {
492
+ "content": "<extra_id_242>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "256058": {
500
+ "content": "<extra_id_241>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "256059": {
508
+ "content": "<extra_id_240>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "256060": {
516
+ "content": "<extra_id_239>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "256061": {
524
+ "content": "<extra_id_238>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "256062": {
532
+ "content": "<extra_id_237>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "256063": {
540
+ "content": "<extra_id_236>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "256064": {
548
+ "content": "<extra_id_235>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "256065": {
556
+ "content": "<extra_id_234>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "256066": {
564
+ "content": "<extra_id_233>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "256067": {
572
+ "content": "<extra_id_232>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "256068": {
580
+ "content": "<extra_id_231>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "256069": {
588
+ "content": "<extra_id_230>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "256070": {
596
+ "content": "<extra_id_229>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "256071": {
604
+ "content": "<extra_id_228>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "256072": {
612
+ "content": "<extra_id_227>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "256073": {
620
+ "content": "<extra_id_226>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "256074": {
628
+ "content": "<extra_id_225>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "256075": {
636
+ "content": "<extra_id_224>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "256076": {
644
+ "content": "<extra_id_223>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "256077": {
652
+ "content": "<extra_id_222>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "256078": {
660
+ "content": "<extra_id_221>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "256079": {
668
+ "content": "<extra_id_220>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "256080": {
676
+ "content": "<extra_id_219>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "256081": {
684
+ "content": "<extra_id_218>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "256082": {
692
+ "content": "<extra_id_217>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "256083": {
700
+ "content": "<extra_id_216>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "256084": {
708
+ "content": "<extra_id_215>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "256085": {
716
+ "content": "<extra_id_214>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "256086": {
724
+ "content": "<extra_id_213>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "256087": {
732
+ "content": "<extra_id_212>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "256088": {
740
+ "content": "<extra_id_211>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "256089": {
748
+ "content": "<extra_id_210>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "256090": {
756
+ "content": "<extra_id_209>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "256091": {
764
+ "content": "<extra_id_208>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "256092": {
772
+ "content": "<extra_id_207>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "256093": {
780
+ "content": "<extra_id_206>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "256094": {
788
+ "content": "<extra_id_205>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "256095": {
796
+ "content": "<extra_id_204>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "256096": {
804
+ "content": "<extra_id_203>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "256097": {
812
+ "content": "<extra_id_202>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "256098": {
820
+ "content": "<extra_id_201>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "256099": {
828
+ "content": "<extra_id_200>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "256100": {
836
+ "content": "<extra_id_199>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "256101": {
844
+ "content": "<extra_id_198>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "256102": {
852
+ "content": "<extra_id_197>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "256103": {
860
+ "content": "<extra_id_196>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "256104": {
868
+ "content": "<extra_id_195>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "256105": {
876
+ "content": "<extra_id_194>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "256106": {
884
+ "content": "<extra_id_193>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "256107": {
892
+ "content": "<extra_id_192>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "256108": {
900
+ "content": "<extra_id_191>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "256109": {
908
+ "content": "<extra_id_190>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "256110": {
916
+ "content": "<extra_id_189>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "256111": {
924
+ "content": "<extra_id_188>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ },
931
+ "256112": {
932
+ "content": "<extra_id_187>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": false
938
+ },
939
+ "256113": {
940
+ "content": "<extra_id_186>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": false
946
+ },
947
+ "256114": {
948
+ "content": "<extra_id_185>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": false
954
+ },
955
+ "256115": {
956
+ "content": "<extra_id_184>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": false
962
+ },
963
+ "256116": {
964
+ "content": "<extra_id_183>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": false
970
+ },
971
+ "256117": {
972
+ "content": "<extra_id_182>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": false
978
+ },
979
+ "256118": {
980
+ "content": "<extra_id_181>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": false
986
+ },
987
+ "256119": {
988
+ "content": "<extra_id_180>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": false
994
+ },
995
+ "256120": {
996
+ "content": "<extra_id_179>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": false
1002
+ },
1003
+ "256121": {
1004
+ "content": "<extra_id_178>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": false
1010
+ },
1011
+ "256122": {
1012
+ "content": "<extra_id_177>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": false
1018
+ },
1019
+ "256123": {
1020
+ "content": "<extra_id_176>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": false
1026
+ },
1027
+ "256124": {
1028
+ "content": "<extra_id_175>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": false
1034
+ },
1035
+ "256125": {
1036
+ "content": "<extra_id_174>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": false
1042
+ },
1043
+ "256126": {
1044
+ "content": "<extra_id_173>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": false
1050
+ },
1051
+ "256127": {
1052
+ "content": "<extra_id_172>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": false
1058
+ },
1059
+ "256128": {
1060
+ "content": "<extra_id_171>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": false
1066
+ },
1067
+ "256129": {
1068
+ "content": "<extra_id_170>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": false
1074
+ },
1075
+ "256130": {
1076
+ "content": "<extra_id_169>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": false
1082
+ },
1083
+ "256131": {
1084
+ "content": "<extra_id_168>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": false
1090
+ },
1091
+ "256132": {
1092
+ "content": "<extra_id_167>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": false
1098
+ },
1099
+ "256133": {
1100
+ "content": "<extra_id_166>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": false
1106
+ },
1107
+ "256134": {
1108
+ "content": "<extra_id_165>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": false
1114
+ },
1115
+ "256135": {
1116
+ "content": "<extra_id_164>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": false
1122
+ },
1123
+ "256136": {
1124
+ "content": "<extra_id_163>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": false
1130
+ },
1131
+ "256137": {
1132
+ "content": "<extra_id_162>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": false
1138
+ },
1139
+ "256138": {
1140
+ "content": "<extra_id_161>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": false
1146
+ },
1147
+ "256139": {
1148
+ "content": "<extra_id_160>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": false
1154
+ },
1155
+ "256140": {
1156
+ "content": "<extra_id_159>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": false
1162
+ },
1163
+ "256141": {
1164
+ "content": "<extra_id_158>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": false
1170
+ },
1171
+ "256142": {
1172
+ "content": "<extra_id_157>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": false
1178
+ },
1179
+ "256143": {
1180
+ "content": "<extra_id_156>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": false
1186
+ },
1187
+ "256144": {
1188
+ "content": "<extra_id_155>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": false
1194
+ },
1195
+ "256145": {
1196
+ "content": "<extra_id_154>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": false
1202
+ },
1203
+ "256146": {
1204
+ "content": "<extra_id_153>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": false
1210
+ },
1211
+ "256147": {
1212
+ "content": "<extra_id_152>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": false
1218
+ },
1219
+ "256148": {
1220
+ "content": "<extra_id_151>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": false
1226
+ },
1227
+ "256149": {
1228
+ "content": "<extra_id_150>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": false
1234
+ },
1235
+ "256150": {
1236
+ "content": "<extra_id_149>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": false
1242
+ },
1243
+ "256151": {
1244
+ "content": "<extra_id_148>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": false
1250
+ },
1251
+ "256152": {
1252
+ "content": "<extra_id_147>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": false
1258
+ },
1259
+ "256153": {
1260
+ "content": "<extra_id_146>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": false
1266
+ },
1267
+ "256154": {
1268
+ "content": "<extra_id_145>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": false
1274
+ },
1275
+ "256155": {
1276
+ "content": "<extra_id_144>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": false
1282
+ },
1283
+ "256156": {
1284
+ "content": "<extra_id_143>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": false
1290
+ },
1291
+ "256157": {
1292
+ "content": "<extra_id_142>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": false
1298
+ },
1299
+ "256158": {
1300
+ "content": "<extra_id_141>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": false
1306
+ },
1307
+ "256159": {
1308
+ "content": "<extra_id_140>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": false
1314
+ },
1315
+ "256160": {
1316
+ "content": "<extra_id_139>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": false
1322
+ },
1323
+ "256161": {
1324
+ "content": "<extra_id_138>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": false
1330
+ },
1331
+ "256162": {
1332
+ "content": "<extra_id_137>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": false
1338
+ },
1339
+ "256163": {
1340
+ "content": "<extra_id_136>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": false
1346
+ },
1347
+ "256164": {
1348
+ "content": "<extra_id_135>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": false
1354
+ },
1355
+ "256165": {
1356
+ "content": "<extra_id_134>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": false
1362
+ },
1363
+ "256166": {
1364
+ "content": "<extra_id_133>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": false
1370
+ },
1371
+ "256167": {
1372
+ "content": "<extra_id_132>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": false
1378
+ },
1379
+ "256168": {
1380
+ "content": "<extra_id_131>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": false
1386
+ },
1387
+ "256169": {
1388
+ "content": "<extra_id_130>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": false
1394
+ },
1395
+ "256170": {
1396
+ "content": "<extra_id_129>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": false
1402
+ },
1403
+ "256171": {
1404
+ "content": "<extra_id_128>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": false
1410
+ },
1411
+ "256172": {
1412
+ "content": "<extra_id_127>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": false
1418
+ },
1419
+ "256173": {
1420
+ "content": "<extra_id_126>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": false
1426
+ },
1427
+ "256174": {
1428
+ "content": "<extra_id_125>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": false
1434
+ },
1435
+ "256175": {
1436
+ "content": "<extra_id_124>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": false
1442
+ },
1443
+ "256176": {
1444
+ "content": "<extra_id_123>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": false
1450
+ },
1451
+ "256177": {
1452
+ "content": "<extra_id_122>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": false
1458
+ },
1459
+ "256178": {
1460
+ "content": "<extra_id_121>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": false
1466
+ },
1467
+ "256179": {
1468
+ "content": "<extra_id_120>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": false
1474
+ },
1475
+ "256180": {
1476
+ "content": "<extra_id_119>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": false
1482
+ },
1483
+ "256181": {
1484
+ "content": "<extra_id_118>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": false
1490
+ },
1491
+ "256182": {
1492
+ "content": "<extra_id_117>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": false
1498
+ },
1499
+ "256183": {
1500
+ "content": "<extra_id_116>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": false
1506
+ },
1507
+ "256184": {
1508
+ "content": "<extra_id_115>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": false
1514
+ },
1515
+ "256185": {
1516
+ "content": "<extra_id_114>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": false
1522
+ },
1523
+ "256186": {
1524
+ "content": "<extra_id_113>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": false
1530
+ },
1531
+ "256187": {
1532
+ "content": "<extra_id_112>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": false
1538
+ },
1539
+ "256188": {
1540
+ "content": "<extra_id_111>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": false
1546
+ },
1547
+ "256189": {
1548
+ "content": "<extra_id_110>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": false
1554
+ },
1555
+ "256190": {
1556
+ "content": "<extra_id_109>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": false
1562
+ },
1563
+ "256191": {
1564
+ "content": "<extra_id_108>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": false
1570
+ },
1571
+ "256192": {
1572
+ "content": "<extra_id_107>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": false
1578
+ },
1579
+ "256193": {
1580
+ "content": "<extra_id_106>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": false
1586
+ },
1587
+ "256194": {
1588
+ "content": "<extra_id_105>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": false
1594
+ },
1595
+ "256195": {
1596
+ "content": "<extra_id_104>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": false
1602
+ },
1603
+ "256196": {
1604
+ "content": "<extra_id_103>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": false
1610
+ },
1611
+ "256197": {
1612
+ "content": "<extra_id_102>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": false
1618
+ },
1619
+ "256198": {
1620
+ "content": "<extra_id_101>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": false
1626
+ },
1627
+ "256199": {
1628
+ "content": "<extra_id_100>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": false
1634
+ },
1635
+ "256200": {
1636
+ "content": "<extra_id_99>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": false
1642
+ },
1643
+ "256201": {
1644
+ "content": "<extra_id_98>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": false
1650
+ },
1651
+ "256202": {
1652
+ "content": "<extra_id_97>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": false
1658
+ },
1659
+ "256203": {
1660
+ "content": "<extra_id_96>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": false
1666
+ },
1667
+ "256204": {
1668
+ "content": "<extra_id_95>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": false
1674
+ },
1675
+ "256205": {
1676
+ "content": "<extra_id_94>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": false
1682
+ },
1683
+ "256206": {
1684
+ "content": "<extra_id_93>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": false
1690
+ },
1691
+ "256207": {
1692
+ "content": "<extra_id_92>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": false
1698
+ },
1699
+ "256208": {
1700
+ "content": "<extra_id_91>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": false
1706
+ },
1707
+ "256209": {
1708
+ "content": "<extra_id_90>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": false
1714
+ },
1715
+ "256210": {
1716
+ "content": "<extra_id_89>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": false
1722
+ },
1723
+ "256211": {
1724
+ "content": "<extra_id_88>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": false
1730
+ },
1731
+ "256212": {
1732
+ "content": "<extra_id_87>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": false
1738
+ },
1739
+ "256213": {
1740
+ "content": "<extra_id_86>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": false
1746
+ },
1747
+ "256214": {
1748
+ "content": "<extra_id_85>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": false
1754
+ },
1755
+ "256215": {
1756
+ "content": "<extra_id_84>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": false
1762
+ },
1763
+ "256216": {
1764
+ "content": "<extra_id_83>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": false
1770
+ },
1771
+ "256217": {
1772
+ "content": "<extra_id_82>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": false
1778
+ },
1779
+ "256218": {
1780
+ "content": "<extra_id_81>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": false
1786
+ },
1787
+ "256219": {
1788
+ "content": "<extra_id_80>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": false
1794
+ },
1795
+ "256220": {
1796
+ "content": "<extra_id_79>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": false
1802
+ },
1803
+ "256221": {
1804
+ "content": "<extra_id_78>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": false
1810
+ },
1811
+ "256222": {
1812
+ "content": "<extra_id_77>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": false
1818
+ },
1819
+ "256223": {
1820
+ "content": "<extra_id_76>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": false
1826
+ },
1827
+ "256224": {
1828
+ "content": "<extra_id_75>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": false
1834
+ },
1835
+ "256225": {
1836
+ "content": "<extra_id_74>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": false
1842
+ },
1843
+ "256226": {
1844
+ "content": "<extra_id_73>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": false
1850
+ },
1851
+ "256227": {
1852
+ "content": "<extra_id_72>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": false
1858
+ },
1859
+ "256228": {
1860
+ "content": "<extra_id_71>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": false
1866
+ },
1867
+ "256229": {
1868
+ "content": "<extra_id_70>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": false
1874
+ },
1875
+ "256230": {
1876
+ "content": "<extra_id_69>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": false
1882
+ },
1883
+ "256231": {
1884
+ "content": "<extra_id_68>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": false
1890
+ },
1891
+ "256232": {
1892
+ "content": "<extra_id_67>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": false
1898
+ },
1899
+ "256233": {
1900
+ "content": "<extra_id_66>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": false
1906
+ },
1907
+ "256234": {
1908
+ "content": "<extra_id_65>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": false
1914
+ },
1915
+ "256235": {
1916
+ "content": "<extra_id_64>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": false
1922
+ },
1923
+ "256236": {
1924
+ "content": "<extra_id_63>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": false
1930
+ },
1931
+ "256237": {
1932
+ "content": "<extra_id_62>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": false
1938
+ },
1939
+ "256238": {
1940
+ "content": "<extra_id_61>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": false
1946
+ },
1947
+ "256239": {
1948
+ "content": "<extra_id_60>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": false
1954
+ },
1955
+ "256240": {
1956
+ "content": "<extra_id_59>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": false
1962
+ },
1963
+ "256241": {
1964
+ "content": "<extra_id_58>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": false
1970
+ },
1971
+ "256242": {
1972
+ "content": "<extra_id_57>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": false
1978
+ },
1979
+ "256243": {
1980
+ "content": "<extra_id_56>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": false
1986
+ },
1987
+ "256244": {
1988
+ "content": "<extra_id_55>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": false
1994
+ },
1995
+ "256245": {
1996
+ "content": "<extra_id_54>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": false
2002
+ },
2003
+ "256246": {
2004
+ "content": "<extra_id_53>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": false
2010
+ },
2011
+ "256247": {
2012
+ "content": "<extra_id_52>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": false
2018
+ },
2019
+ "256248": {
2020
+ "content": "<extra_id_51>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": false
2026
+ },
2027
+ "256249": {
2028
+ "content": "<extra_id_50>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": false
2034
+ },
2035
+ "256250": {
2036
+ "content": "<extra_id_49>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": false
2042
+ },
2043
+ "256251": {
2044
+ "content": "<extra_id_48>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": false
2050
+ },
2051
+ "256252": {
2052
+ "content": "<extra_id_47>",
2053
+ "lstrip": false,
2054
+ "normalized": false,
2055
+ "rstrip": false,
2056
+ "single_word": false,
2057
+ "special": false
2058
+ },
2059
+ "256253": {
2060
+ "content": "<extra_id_46>",
2061
+ "lstrip": false,
2062
+ "normalized": false,
2063
+ "rstrip": false,
2064
+ "single_word": false,
2065
+ "special": false
2066
+ },
2067
+ "256254": {
2068
+ "content": "<extra_id_45>",
2069
+ "lstrip": false,
2070
+ "normalized": false,
2071
+ "rstrip": false,
2072
+ "single_word": false,
2073
+ "special": false
2074
+ },
2075
+ "256255": {
2076
+ "content": "<extra_id_44>",
2077
+ "lstrip": false,
2078
+ "normalized": false,
2079
+ "rstrip": false,
2080
+ "single_word": false,
2081
+ "special": false
2082
+ },
2083
+ "256256": {
2084
+ "content": "<extra_id_43>",
2085
+ "lstrip": false,
2086
+ "normalized": false,
2087
+ "rstrip": false,
2088
+ "single_word": false,
2089
+ "special": false
2090
+ },
2091
+ "256257": {
2092
+ "content": "<extra_id_42>",
2093
+ "lstrip": false,
2094
+ "normalized": false,
2095
+ "rstrip": false,
2096
+ "single_word": false,
2097
+ "special": false
2098
+ },
2099
+ "256258": {
2100
+ "content": "<extra_id_41>",
2101
+ "lstrip": false,
2102
+ "normalized": false,
2103
+ "rstrip": false,
2104
+ "single_word": false,
2105
+ "special": false
2106
+ },
2107
+ "256259": {
2108
+ "content": "<extra_id_40>",
2109
+ "lstrip": false,
2110
+ "normalized": false,
2111
+ "rstrip": false,
2112
+ "single_word": false,
2113
+ "special": false
2114
+ },
2115
+ "256260": {
2116
+ "content": "<extra_id_39>",
2117
+ "lstrip": false,
2118
+ "normalized": false,
2119
+ "rstrip": false,
2120
+ "single_word": false,
2121
+ "special": false
2122
+ },
2123
+ "256261": {
2124
+ "content": "<extra_id_38>",
2125
+ "lstrip": false,
2126
+ "normalized": false,
2127
+ "rstrip": false,
2128
+ "single_word": false,
2129
+ "special": false
2130
+ },
2131
+ "256262": {
2132
+ "content": "<extra_id_37>",
2133
+ "lstrip": false,
2134
+ "normalized": false,
2135
+ "rstrip": false,
2136
+ "single_word": false,
2137
+ "special": false
2138
+ },
2139
+ "256263": {
2140
+ "content": "<extra_id_36>",
2141
+ "lstrip": false,
2142
+ "normalized": false,
2143
+ "rstrip": false,
2144
+ "single_word": false,
2145
+ "special": false
2146
+ },
2147
+ "256264": {
2148
+ "content": "<extra_id_35>",
2149
+ "lstrip": false,
2150
+ "normalized": false,
2151
+ "rstrip": false,
2152
+ "single_word": false,
2153
+ "special": false
2154
+ },
2155
+ "256265": {
2156
+ "content": "<extra_id_34>",
2157
+ "lstrip": false,
2158
+ "normalized": false,
2159
+ "rstrip": false,
2160
+ "single_word": false,
2161
+ "special": false
2162
+ },
2163
+ "256266": {
2164
+ "content": "<extra_id_33>",
2165
+ "lstrip": false,
2166
+ "normalized": false,
2167
+ "rstrip": false,
2168
+ "single_word": false,
2169
+ "special": false
2170
+ },
2171
+ "256267": {
2172
+ "content": "<extra_id_32>",
2173
+ "lstrip": false,
2174
+ "normalized": false,
2175
+ "rstrip": false,
2176
+ "single_word": false,
2177
+ "special": false
2178
+ },
2179
+ "256268": {
2180
+ "content": "<extra_id_31>",
2181
+ "lstrip": false,
2182
+ "normalized": false,
2183
+ "rstrip": false,
2184
+ "single_word": false,
2185
+ "special": false
2186
+ },
2187
+ "256269": {
2188
+ "content": "<extra_id_30>",
2189
+ "lstrip": false,
2190
+ "normalized": false,
2191
+ "rstrip": false,
2192
+ "single_word": false,
2193
+ "special": false
2194
+ },
2195
+ "256270": {
2196
+ "content": "<extra_id_29>",
2197
+ "lstrip": false,
2198
+ "normalized": false,
2199
+ "rstrip": false,
2200
+ "single_word": false,
2201
+ "special": false
2202
+ },
2203
+ "256271": {
2204
+ "content": "<extra_id_28>",
2205
+ "lstrip": false,
2206
+ "normalized": false,
2207
+ "rstrip": false,
2208
+ "single_word": false,
2209
+ "special": false
2210
+ },
2211
+ "256272": {
2212
+ "content": "<extra_id_27>",
2213
+ "lstrip": false,
2214
+ "normalized": false,
2215
+ "rstrip": false,
2216
+ "single_word": false,
2217
+ "special": false
2218
+ },
2219
+ "256273": {
2220
+ "content": "<extra_id_26>",
2221
+ "lstrip": false,
2222
+ "normalized": false,
2223
+ "rstrip": false,
2224
+ "single_word": false,
2225
+ "special": false
2226
+ },
2227
+ "256274": {
2228
+ "content": "<extra_id_25>",
2229
+ "lstrip": false,
2230
+ "normalized": false,
2231
+ "rstrip": false,
2232
+ "single_word": false,
2233
+ "special": false
2234
+ },
2235
+ "256275": {
2236
+ "content": "<extra_id_24>",
2237
+ "lstrip": false,
2238
+ "normalized": false,
2239
+ "rstrip": false,
2240
+ "single_word": false,
2241
+ "special": false
2242
+ },
2243
+ "256276": {
2244
+ "content": "<extra_id_23>",
2245
+ "lstrip": false,
2246
+ "normalized": false,
2247
+ "rstrip": false,
2248
+ "single_word": false,
2249
+ "special": false
2250
+ },
2251
+ "256277": {
2252
+ "content": "<extra_id_22>",
2253
+ "lstrip": false,
2254
+ "normalized": false,
2255
+ "rstrip": false,
2256
+ "single_word": false,
2257
+ "special": false
2258
+ },
2259
+ "256278": {
2260
+ "content": "<extra_id_21>",
2261
+ "lstrip": false,
2262
+ "normalized": false,
2263
+ "rstrip": false,
2264
+ "single_word": false,
2265
+ "special": false
2266
+ },
2267
+ "256279": {
2268
+ "content": "<extra_id_20>",
2269
+ "lstrip": false,
2270
+ "normalized": false,
2271
+ "rstrip": false,
2272
+ "single_word": false,
2273
+ "special": false
2274
+ },
2275
+ "256280": {
2276
+ "content": "<extra_id_19>",
2277
+ "lstrip": false,
2278
+ "normalized": false,
2279
+ "rstrip": false,
2280
+ "single_word": false,
2281
+ "special": false
2282
+ },
2283
+ "256281": {
2284
+ "content": "<extra_id_18>",
2285
+ "lstrip": false,
2286
+ "normalized": false,
2287
+ "rstrip": false,
2288
+ "single_word": false,
2289
+ "special": false
2290
+ },
2291
+ "256282": {
2292
+ "content": "<extra_id_17>",
2293
+ "lstrip": false,
2294
+ "normalized": false,
2295
+ "rstrip": false,
2296
+ "single_word": false,
2297
+ "special": false
2298
+ },
2299
+ "256283": {
2300
+ "content": "<extra_id_16>",
2301
+ "lstrip": false,
2302
+ "normalized": false,
2303
+ "rstrip": false,
2304
+ "single_word": false,
2305
+ "special": false
2306
+ },
2307
+ "256284": {
2308
+ "content": "<extra_id_15>",
2309
+ "lstrip": false,
2310
+ "normalized": false,
2311
+ "rstrip": false,
2312
+ "single_word": false,
2313
+ "special": false
2314
+ },
2315
+ "256285": {
2316
+ "content": "<extra_id_14>",
2317
+ "lstrip": false,
2318
+ "normalized": false,
2319
+ "rstrip": false,
2320
+ "single_word": false,
2321
+ "special": false
2322
+ },
2323
+ "256286": {
2324
+ "content": "<extra_id_13>",
2325
+ "lstrip": false,
2326
+ "normalized": false,
2327
+ "rstrip": false,
2328
+ "single_word": false,
2329
+ "special": false
2330
+ },
2331
+ "256287": {
2332
+ "content": "<extra_id_12>",
2333
+ "lstrip": false,
2334
+ "normalized": false,
2335
+ "rstrip": false,
2336
+ "single_word": false,
2337
+ "special": false
2338
+ },
2339
+ "256288": {
2340
+ "content": "<extra_id_11>",
2341
+ "lstrip": false,
2342
+ "normalized": false,
2343
+ "rstrip": false,
2344
+ "single_word": false,
2345
+ "special": false
2346
+ },
2347
+ "256289": {
2348
+ "content": "<extra_id_10>",
2349
+ "lstrip": false,
2350
+ "normalized": false,
2351
+ "rstrip": false,
2352
+ "single_word": false,
2353
+ "special": false
2354
+ },
2355
+ "256290": {
2356
+ "content": "<extra_id_9>",
2357
+ "lstrip": false,
2358
+ "normalized": false,
2359
+ "rstrip": false,
2360
+ "single_word": false,
2361
+ "special": false
2362
+ },
2363
+ "256291": {
2364
+ "content": "<extra_id_8>",
2365
+ "lstrip": false,
2366
+ "normalized": false,
2367
+ "rstrip": false,
2368
+ "single_word": false,
2369
+ "special": false
2370
+ },
2371
+ "256292": {
2372
+ "content": "<extra_id_7>",
2373
+ "lstrip": false,
2374
+ "normalized": false,
2375
+ "rstrip": false,
2376
+ "single_word": false,
2377
+ "special": false
2378
+ },
2379
+ "256293": {
2380
+ "content": "<extra_id_6>",
2381
+ "lstrip": false,
2382
+ "normalized": false,
2383
+ "rstrip": false,
2384
+ "single_word": false,
2385
+ "special": false
2386
+ },
2387
+ "256294": {
2388
+ "content": "<extra_id_5>",
2389
+ "lstrip": false,
2390
+ "normalized": false,
2391
+ "rstrip": false,
2392
+ "single_word": false,
2393
+ "special": false
2394
+ },
2395
+ "256295": {
2396
+ "content": "<extra_id_4>",
2397
+ "lstrip": false,
2398
+ "normalized": false,
2399
+ "rstrip": false,
2400
+ "single_word": false,
2401
+ "special": false
2402
+ },
2403
+ "256296": {
2404
+ "content": "<extra_id_3>",
2405
+ "lstrip": false,
2406
+ "normalized": false,
2407
+ "rstrip": false,
2408
+ "single_word": false,
2409
+ "special": false
2410
+ },
2411
+ "256297": {
2412
+ "content": "<extra_id_2>",
2413
+ "lstrip": false,
2414
+ "normalized": false,
2415
+ "rstrip": false,
2416
+ "single_word": false,
2417
+ "special": false
2418
+ },
2419
+ "256298": {
2420
+ "content": "<extra_id_1>",
2421
+ "lstrip": false,
2422
+ "normalized": false,
2423
+ "rstrip": false,
2424
+ "single_word": false,
2425
+ "special": false
2426
+ },
2427
+ "256299": {
2428
+ "content": "<extra_id_0>",
2429
+ "lstrip": false,
2430
+ "normalized": false,
2431
+ "rstrip": false,
2432
+ "single_word": false,
2433
+ "special": false
2434
+ }
2435
+ },
2436
+ "additional_special_tokens": [
2437
+ "<extra_id_0>",
2438
+ "<extra_id_1>",
2439
+ "<extra_id_2>",
2440
+ "<extra_id_3>",
2441
+ "<extra_id_4>",
2442
+ "<extra_id_5>",
2443
+ "<extra_id_6>",
2444
+ "<extra_id_7>",
2445
+ "<extra_id_8>",
2446
+ "<extra_id_9>",
2447
+ "<extra_id_10>",
2448
+ "<extra_id_11>",
2449
+ "<extra_id_12>",
2450
+ "<extra_id_13>",
2451
+ "<extra_id_14>",
2452
+ "<extra_id_15>",
2453
+ "<extra_id_16>",
2454
+ "<extra_id_17>",
2455
+ "<extra_id_18>",
2456
+ "<extra_id_19>",
2457
+ "<extra_id_20>",
2458
+ "<extra_id_21>",
2459
+ "<extra_id_22>",
2460
+ "<extra_id_23>",
2461
+ "<extra_id_24>",
2462
+ "<extra_id_25>",
2463
+ "<extra_id_26>",
2464
+ "<extra_id_27>",
2465
+ "<extra_id_28>",
2466
+ "<extra_id_29>",
2467
+ "<extra_id_30>",
2468
+ "<extra_id_31>",
2469
+ "<extra_id_32>",
2470
+ "<extra_id_33>",
2471
+ "<extra_id_34>",
2472
+ "<extra_id_35>",
2473
+ "<extra_id_36>",
2474
+ "<extra_id_37>",
2475
+ "<extra_id_38>",
2476
+ "<extra_id_39>",
2477
+ "<extra_id_40>",
2478
+ "<extra_id_41>",
2479
+ "<extra_id_42>",
2480
+ "<extra_id_43>",
2481
+ "<extra_id_44>",
2482
+ "<extra_id_45>",
2483
+ "<extra_id_46>",
2484
+ "<extra_id_47>",
2485
+ "<extra_id_48>",
2486
+ "<extra_id_49>",
2487
+ "<extra_id_50>",
2488
+ "<extra_id_51>",
2489
+ "<extra_id_52>",
2490
+ "<extra_id_53>",
2491
+ "<extra_id_54>",
2492
+ "<extra_id_55>",
2493
+ "<extra_id_56>",
2494
+ "<extra_id_57>",
2495
+ "<extra_id_58>",
2496
+ "<extra_id_59>",
2497
+ "<extra_id_60>",
2498
+ "<extra_id_61>",
2499
+ "<extra_id_62>",
2500
+ "<extra_id_63>",
2501
+ "<extra_id_64>",
2502
+ "<extra_id_65>",
2503
+ "<extra_id_66>",
2504
+ "<extra_id_67>",
2505
+ "<extra_id_68>",
2506
+ "<extra_id_69>",
2507
+ "<extra_id_70>",
2508
+ "<extra_id_71>",
2509
+ "<extra_id_72>",
2510
+ "<extra_id_73>",
2511
+ "<extra_id_74>",
2512
+ "<extra_id_75>",
2513
+ "<extra_id_76>",
2514
+ "<extra_id_77>",
2515
+ "<extra_id_78>",
2516
+ "<extra_id_79>",
2517
+ "<extra_id_80>",
2518
+ "<extra_id_81>",
2519
+ "<extra_id_82>",
2520
+ "<extra_id_83>",
2521
+ "<extra_id_84>",
2522
+ "<extra_id_85>",
2523
+ "<extra_id_86>",
2524
+ "<extra_id_87>",
2525
+ "<extra_id_88>",
2526
+ "<extra_id_89>",
2527
+ "<extra_id_90>",
2528
+ "<extra_id_91>",
2529
+ "<extra_id_92>",
2530
+ "<extra_id_93>",
2531
+ "<extra_id_94>",
2532
+ "<extra_id_95>",
2533
+ "<extra_id_96>",
2534
+ "<extra_id_97>",
2535
+ "<extra_id_98>",
2536
+ "<extra_id_99>",
2537
+ "<extra_id_100>",
2538
+ "<extra_id_101>",
2539
+ "<extra_id_102>",
2540
+ "<extra_id_103>",
2541
+ "<extra_id_104>",
2542
+ "<extra_id_105>",
2543
+ "<extra_id_106>",
2544
+ "<extra_id_107>",
2545
+ "<extra_id_108>",
2546
+ "<extra_id_109>",
2547
+ "<extra_id_110>",
2548
+ "<extra_id_111>",
2549
+ "<extra_id_112>",
2550
+ "<extra_id_113>",
2551
+ "<extra_id_114>",
2552
+ "<extra_id_115>",
2553
+ "<extra_id_116>",
2554
+ "<extra_id_117>",
2555
+ "<extra_id_118>",
2556
+ "<extra_id_119>",
2557
+ "<extra_id_120>",
2558
+ "<extra_id_121>",
2559
+ "<extra_id_122>",
2560
+ "<extra_id_123>",
2561
+ "<extra_id_124>",
2562
+ "<extra_id_125>",
2563
+ "<extra_id_126>",
2564
+ "<extra_id_127>",
2565
+ "<extra_id_128>",
2566
+ "<extra_id_129>",
2567
+ "<extra_id_130>",
2568
+ "<extra_id_131>",
2569
+ "<extra_id_132>",
2570
+ "<extra_id_133>",
2571
+ "<extra_id_134>",
2572
+ "<extra_id_135>",
2573
+ "<extra_id_136>",
2574
+ "<extra_id_137>",
2575
+ "<extra_id_138>",
2576
+ "<extra_id_139>",
2577
+ "<extra_id_140>",
2578
+ "<extra_id_141>",
2579
+ "<extra_id_142>",
2580
+ "<extra_id_143>",
2581
+ "<extra_id_144>",
2582
+ "<extra_id_145>",
2583
+ "<extra_id_146>",
2584
+ "<extra_id_147>",
2585
+ "<extra_id_148>",
2586
+ "<extra_id_149>",
2587
+ "<extra_id_150>",
2588
+ "<extra_id_151>",
2589
+ "<extra_id_152>",
2590
+ "<extra_id_153>",
2591
+ "<extra_id_154>",
2592
+ "<extra_id_155>",
2593
+ "<extra_id_156>",
2594
+ "<extra_id_157>",
2595
+ "<extra_id_158>",
2596
+ "<extra_id_159>",
2597
+ "<extra_id_160>",
2598
+ "<extra_id_161>",
2599
+ "<extra_id_162>",
2600
+ "<extra_id_163>",
2601
+ "<extra_id_164>",
2602
+ "<extra_id_165>",
2603
+ "<extra_id_166>",
2604
+ "<extra_id_167>",
2605
+ "<extra_id_168>",
2606
+ "<extra_id_169>",
2607
+ "<extra_id_170>",
2608
+ "<extra_id_171>",
2609
+ "<extra_id_172>",
2610
+ "<extra_id_173>",
2611
+ "<extra_id_174>",
2612
+ "<extra_id_175>",
2613
+ "<extra_id_176>",
2614
+ "<extra_id_177>",
2615
+ "<extra_id_178>",
2616
+ "<extra_id_179>",
2617
+ "<extra_id_180>",
2618
+ "<extra_id_181>",
2619
+ "<extra_id_182>",
2620
+ "<extra_id_183>",
2621
+ "<extra_id_184>",
2622
+ "<extra_id_185>",
2623
+ "<extra_id_186>",
2624
+ "<extra_id_187>",
2625
+ "<extra_id_188>",
2626
+ "<extra_id_189>",
2627
+ "<extra_id_190>",
2628
+ "<extra_id_191>",
2629
+ "<extra_id_192>",
2630
+ "<extra_id_193>",
2631
+ "<extra_id_194>",
2632
+ "<extra_id_195>",
2633
+ "<extra_id_196>",
2634
+ "<extra_id_197>",
2635
+ "<extra_id_198>",
2636
+ "<extra_id_199>",
2637
+ "<extra_id_200>",
2638
+ "<extra_id_201>",
2639
+ "<extra_id_202>",
2640
+ "<extra_id_203>",
2641
+ "<extra_id_204>",
2642
+ "<extra_id_205>",
2643
+ "<extra_id_206>",
2644
+ "<extra_id_207>",
2645
+ "<extra_id_208>",
2646
+ "<extra_id_209>",
2647
+ "<extra_id_210>",
2648
+ "<extra_id_211>",
2649
+ "<extra_id_212>",
2650
+ "<extra_id_213>",
2651
+ "<extra_id_214>",
2652
+ "<extra_id_215>",
2653
+ "<extra_id_216>",
2654
+ "<extra_id_217>",
2655
+ "<extra_id_218>",
2656
+ "<extra_id_219>",
2657
+ "<extra_id_220>",
2658
+ "<extra_id_221>",
2659
+ "<extra_id_222>",
2660
+ "<extra_id_223>",
2661
+ "<extra_id_224>",
2662
+ "<extra_id_225>",
2663
+ "<extra_id_226>",
2664
+ "<extra_id_227>",
2665
+ "<extra_id_228>",
2666
+ "<extra_id_229>",
2667
+ "<extra_id_230>",
2668
+ "<extra_id_231>",
2669
+ "<extra_id_232>",
2670
+ "<extra_id_233>",
2671
+ "<extra_id_234>",
2672
+ "<extra_id_235>",
2673
+ "<extra_id_236>",
2674
+ "<extra_id_237>",
2675
+ "<extra_id_238>",
2676
+ "<extra_id_239>",
2677
+ "<extra_id_240>",
2678
+ "<extra_id_241>",
2679
+ "<extra_id_242>",
2680
+ "<extra_id_243>",
2681
+ "<extra_id_244>",
2682
+ "<extra_id_245>",
2683
+ "<extra_id_246>",
2684
+ "<extra_id_247>",
2685
+ "<extra_id_248>",
2686
+ "<extra_id_249>",
2687
+ "<extra_id_250>",
2688
+ "<extra_id_251>",
2689
+ "<extra_id_252>",
2690
+ "<extra_id_253>",
2691
+ "<extra_id_254>",
2692
+ "<extra_id_255>",
2693
+ "<extra_id_256>",
2694
+ "<extra_id_257>",
2695
+ "<extra_id_258>",
2696
+ "<extra_id_259>",
2697
+ "<extra_id_260>",
2698
+ "<extra_id_261>",
2699
+ "<extra_id_262>",
2700
+ "<extra_id_263>",
2701
+ "<extra_id_264>",
2702
+ "<extra_id_265>",
2703
+ "<extra_id_266>",
2704
+ "<extra_id_267>",
2705
+ "<extra_id_268>",
2706
+ "<extra_id_269>",
2707
+ "<extra_id_270>",
2708
+ "<extra_id_271>",
2709
+ "<extra_id_272>",
2710
+ "<extra_id_273>",
2711
+ "<extra_id_274>",
2712
+ "<extra_id_275>",
2713
+ "<extra_id_276>",
2714
+ "<extra_id_277>",
2715
+ "<extra_id_278>",
2716
+ "<extra_id_279>",
2717
+ "<extra_id_280>",
2718
+ "<extra_id_281>",
2719
+ "<extra_id_282>",
2720
+ "<extra_id_283>",
2721
+ "<extra_id_284>",
2722
+ "<extra_id_285>",
2723
+ "<extra_id_286>",
2724
+ "<extra_id_287>",
2725
+ "<extra_id_288>",
2726
+ "<extra_id_289>",
2727
+ "<extra_id_290>",
2728
+ "<extra_id_291>",
2729
+ "<extra_id_292>",
2730
+ "<extra_id_293>",
2731
+ "<extra_id_294>",
2732
+ "<extra_id_295>",
2733
+ "<extra_id_296>",
2734
+ "<extra_id_297>",
2735
+ "<extra_id_298>",
2736
+ "<extra_id_299>"
2737
+ ],
2738
+ "bos_token": "<s>",
2739
+ "clean_up_tokenization_spaces": true,
2740
+ "eos_token": "</s>",
2741
+ "extra_ids": 300,
2742
+ "legacy": false,
2743
+ "model_max_length": 1000000000000000019884624838656,
2744
+ "pad_token": "<pad>",
2745
+ "sp_model_kwargs": {},
2746
+ "spaces_between_special_tokens": false,
2747
+ "tokenizer_class": "OpenMoeTokenizer",
2748
+ "trust_remote_code": true,
2749
+ "unk_token": "<unk>",
2750
+ "verbose": false,
2751
+ "auto_map": {
2752
+ "AutoTokenizer": [
2753
+ "tokenization_openmoe.OpenMoeTokenizer",
2754
+ null
2755
+ ]
2756
+ }
2757
+ }