Text Generation
Transformers
PyTorch
xglm
Inference Endpoints
nutkung1 commited on
Commit
74cf9a6
1 Parent(s): 692b2c9

Upload 9 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ datasets:
4
+ - laion/OIG
5
+ - Hello-SimpleAI/HC3
6
+ - databricks/databricks-dolly-15k
7
+ language:
8
+ - en
9
+ - th
10
+ - ja
11
+ - vi
12
+ pipeline_tag: text-generation
13
+ ---
14
+ # Model Card for WangChanGLM 🐘 - The Multilingual Instruction-Following Model
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ WangChanGLM is a multilingual, instruction-finetuned Facebook XGLM-7.5B using open-source, commercially permissible datasets (LAION OIG chip2 and infill_dbpedia, DataBricks Dolly v2, OpenAI TL;DR, and Hello-SimpleAI HC3; about 400k examples), released under CC-BY SA 4.0. The models are trained to perform a subset of instruction-following tasks we found most relevant namely: reading comprehension, brainstorming, and creative writing. We provide the weights for a model finetuned on an English-only dataset ([wangchanglm-7.5B-sft-en](https://huggingface.co/pythainlp/wangchanglm-7.5B-sft-en)) and another checkpoint further finetuned on Google-Translated Thai dataset ([wangchanglm-7.5B-sft-enth](https://huggingface.co/pythainlp/wangchanglm-7.5B-sft-enth)). We perform Vicuna-style evaluation using both humans and ChatGPT (in our case, `gpt-3.5-turbo` since we are still on the waitlist for `gpt-4`) and observe some discrepancies between the two types of annoators. All training and evaluation codes are shared under the [Apache-2.0 license](https://github.com/pythainlp/wangchanglm/blob/main/LICENSE) in our Github, as well as datasets and model weights on [HuggingFace](https://huggingface.co/pythainlp). In a similar manner to [Dolly v2](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm), we use only use open-source, commercially permissive pretrained models and datasets, our models are neither restricted by non-commercial clause like models that use LLaMA as base nor non-compete clause like models that use self-instruct datasets from ChatGPT. See our live demo [here]().
19
+
20
+ - **Developed by:** [PyThaiNLP](https://www.github.com/pythainlp) and [VISTEC-depa AI Research Institute of Thailand](https://huggingface.co/airesearch)
21
+ - **Model type:** Finetuned [XGLM-7.5B](https://huggingface.co/facebook/xglm-7.5B)
22
+ - **Language(s) (NLP)**: `en`, `th`, `ja`, `vi` capacibilities evaluated, theoretically all 30 languages of [XGLM-7.5B](https://huggingface.co/facebook/xglm-7.5B)
23
+ - **License:** [CC-BY SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
24
+
25
+ ### Model Sources
26
+
27
+ <!-- Provide the basic links for the model. -->
28
+
29
+ - **Repository:** [pythainlp/wangchanglm](https://www.github.com/pythainlp/wangchanglm)
30
+ - **Blog:** [Medium](https://link.medium.com/s2MWr3ZXnzb)
31
+ - **Demo:** [Colab notebook](https://colab.research.google.com/github/pythainlp/WangChanGLM/blob/main/demo/WangChanGLM_v0_1_demo.ipynb)
32
+
33
+ ## Uses
34
+
35
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
36
+
37
+ ### Direct Use
38
+
39
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
40
+
41
+ Intended to be use as an instruction-following model for reading comprehension, brainstorming and creative writing.
42
+
43
+ ### Downstream Use
44
+
45
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
46
+
47
+ The model can be finetuned for any typical instruction-following use cases.
48
+
49
+ ### Out-of-Scope Use
50
+
51
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
52
+
53
+ We do not expect the models to perform well in math problems, reasoning, and factfulness. We intentionally filter out training examples from these use cases.
54
+
55
+ ## Bias, Risks, and Limitations
56
+
57
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
58
+
59
+ We noticed similar limitations to other finetuned instruction followers such as math problems, reasoning, and factfulness. Even though the models do not perform on the level that we expect them to be abused, they do contain undesirable biases and toxicity and should be further optimized for your particular use cases.
60
+
61
+ ### Recommendations
62
+
63
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
64
+
65
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
66
+
67
+ ## How to Get Started with the Model
68
+
69
+ Use the code below to get started with the model.
70
+
71
+ ```
72
+ model_name = "pythainlp/wangchanglm-7.5B-sft-en"
73
+ model = AutoModelForCausalLM.from_pretrained(
74
+ model_name,
75
+ return_dict=True,
76
+ load_in_8bit=True ,
77
+ device_map="auto",
78
+ torch_dtype=torch.float16,
79
+ offload_folder="./",
80
+ low_cpu_mem_usage=True,
81
+ )
82
+ text = "เล่นหุ้นยังไงให้รวย"
83
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
84
+ batch = tokenizer(text, return_tensors="pt")
85
+ with torch.cuda.amp.autocast():
86
+ output_tokens = model.generate(
87
+ input_ids=batch["input_ids"],
88
+ max_new_tokens=max_gen_len, # 512
89
+ begin_suppress_tokens = exclude_ids,
90
+ no_repeat_ngram_size=2,
91
+
92
+ #oasst k50
93
+ top_k=50,
94
+ top_p=top_p, # 0.95
95
+ typical_p=1.,
96
+ temperature=temperature, # 0.9
97
+
98
+ # #oasst typical3
99
+ # typical_p = 0.3,
100
+ # temperature = 0.8,
101
+ # repetition_penalty = 1.2,
102
+ )
103
+ tokenizer.decode(output_tokens[0], skip_special_tokens=True)
104
+ ```
105
+
106
+ ## Training Details
107
+
108
+ ### Training Data
109
+
110
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
111
+
112
+ Finetuning datasets are sourced from [LAION OIG chip2 and infill_dbpedia](https://huggingface.co/datasets/laion/OIG) ([Apache-2.0](https://github.com/pythainlp/wangchanglm/blob/main/LICENSE)), [DataBricks Dolly v2](https://github.com/databrickslabs/dolly) ([Apache-2.0](https://github.com/pythainlp/wangchanglm/blob/main/LICENSE)), [OpenAI TL;DR](https://github.com/openai/summarize-from-feedback) ([MIT](https://opensource.org/license/mit/)), and [Hello-SimpleAI HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3) ([CC-BY SA](https://creativecommons.org/licenses/by-sa/4.0/)).
113
+
114
+ ### Training Procedure
115
+
116
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
117
+
118
+ #### Preprocessing
119
+
120
+ See [pythainlp/wangchanglm](https://www.github.com/pythainlp/wangchanglm).
121
+
122
+
123
+ #### Training Hyperparameters
124
+
125
+ - **Training regime:** LoRA with 4 GPUs. See more details at [pythainlp/wangchanglm](https://www.github.com/pythainlp/wangchanglm).
126
+
127
+ ## Evaluation
128
+
129
+ <!-- This section describes the evaluation protocols and provides the results. -->
130
+
131
+ We performed automatic evaluation in the style of [Vicuna](https://vicuna.lmsys.org/) and human evaluation. See more details from our [blog]().
132
+
133
+ ## Environmental Impact
134
+
135
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
136
+
137
+ Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432 kgCO2eq/kWh. A cumulative of 500 hours of computation was performed on hardware of type Tesla V100-SXM2-32GB (TDP of 300W). Total emissions are estimated to be 64.8 CO2eq of which 0 percents were directly offset. Estimations were conducted using the [MachineLearning Impact calculator](https://mlco2.github.io/impact#compute).
138
+
139
+ ## Citation
140
+
141
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
142
+
143
+ **BibTeX:**
144
+
145
+ ```
146
+ @software{charin_polpanumas_2023_7878101,
147
+ author = {Charin Polpanumas and
148
+ Wannaphong Phatthiyaphaibun and
149
+ Patomporn Payoungkhamdee and
150
+ Peerat Limkonchotiwat and
151
+ Lalita Lowphansirikul and
152
+ Can Udomcharoenchaikit and
153
+ Titipat Achakulwisut and
154
+ Ekapol Chuangsuwanich and
155
+ Sarana Nutanong},
156
+ title = {{WangChanGLM🐘 — The Multilingual Instruction-
157
+ Following Model}},
158
+ month = apr,
159
+ year = 2023,
160
+ publisher = {Zenodo},
161
+ version = {v0.1},
162
+ doi = {10.5281/zenodo.7878101},
163
+ url = {https://doi.org/10.5281/zenodo.7878101}
164
+ }
165
+ ```
166
+
167
+ ## Model Card Contact
168
+
169
+ [PyThaiNLP](https://github.com/pythainlp)
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/xglm-7.5B",
3
+ "activation_dropout": 0.0,
4
+ "activation_function": "gelu",
5
+ "architectures": [
6
+ "XGLMForCausalLM"
7
+ ],
8
+ "attention_dropout": 0.1,
9
+ "attention_heads": 32,
10
+ "bos_token_id": 0,
11
+ "d_model": 4096,
12
+ "decoder_start_token_id": 2,
13
+ "dropout": 0.1,
14
+ "eos_token_id": 2,
15
+ "ffn_dim": 16384,
16
+ "init_std": 0.02,
17
+ "layerdrop": 0.0,
18
+ "max_position_embeddings": 2048,
19
+ "model_type": "xglm",
20
+ "num_layers": 32,
21
+ "pad_token_id": 1,
22
+ "scale_embedding": true,
23
+ "torch_dtype": "float16",
24
+ "transformers_version": "4.28.0.dev0",
25
+ "use_cache": true,
26
+ "vocab_size": 256008
27
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "decoder_start_token_id": 2,
5
+ "eos_token_id": 2,
6
+ "pad_token_id": 1,
7
+ "transformers_version": "4.28.0.dev0"
8
+ }
gitattributes.txt ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
35
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,523 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 17082761216
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00002-of-00002.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00002.bin",
8
+ "model.layer_norm.bias": "pytorch_model-00002-of-00002.bin",
9
+ "model.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
10
+ "model.layers.0.fc1.bias": "pytorch_model-00001-of-00002.bin",
11
+ "model.layers.0.fc1.weight": "pytorch_model-00001-of-00002.bin",
12
+ "model.layers.0.fc2.bias": "pytorch_model-00001-of-00002.bin",
13
+ "model.layers.0.fc2.weight": "pytorch_model-00001-of-00002.bin",
14
+ "model.layers.0.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
15
+ "model.layers.0.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
16
+ "model.layers.0.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
17
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
18
+ "model.layers.0.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
19
+ "model.layers.0.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
20
+ "model.layers.0.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
21
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
22
+ "model.layers.0.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
23
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
24
+ "model.layers.0.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
25
+ "model.layers.0.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
26
+ "model.layers.1.fc1.bias": "pytorch_model-00001-of-00002.bin",
27
+ "model.layers.1.fc1.weight": "pytorch_model-00001-of-00002.bin",
28
+ "model.layers.1.fc2.bias": "pytorch_model-00001-of-00002.bin",
29
+ "model.layers.1.fc2.weight": "pytorch_model-00001-of-00002.bin",
30
+ "model.layers.1.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
31
+ "model.layers.1.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
32
+ "model.layers.1.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
33
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
34
+ "model.layers.1.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
35
+ "model.layers.1.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
36
+ "model.layers.1.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
37
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
38
+ "model.layers.1.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
39
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
40
+ "model.layers.1.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
41
+ "model.layers.1.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
42
+ "model.layers.10.fc1.bias": "pytorch_model-00001-of-00002.bin",
43
+ "model.layers.10.fc1.weight": "pytorch_model-00001-of-00002.bin",
44
+ "model.layers.10.fc2.bias": "pytorch_model-00001-of-00002.bin",
45
+ "model.layers.10.fc2.weight": "pytorch_model-00001-of-00002.bin",
46
+ "model.layers.10.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
47
+ "model.layers.10.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
48
+ "model.layers.10.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
49
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
50
+ "model.layers.10.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
51
+ "model.layers.10.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
52
+ "model.layers.10.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
53
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
54
+ "model.layers.10.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
55
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
56
+ "model.layers.10.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
57
+ "model.layers.10.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
58
+ "model.layers.11.fc1.bias": "pytorch_model-00001-of-00002.bin",
59
+ "model.layers.11.fc1.weight": "pytorch_model-00001-of-00002.bin",
60
+ "model.layers.11.fc2.bias": "pytorch_model-00001-of-00002.bin",
61
+ "model.layers.11.fc2.weight": "pytorch_model-00001-of-00002.bin",
62
+ "model.layers.11.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
63
+ "model.layers.11.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
64
+ "model.layers.11.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
65
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
66
+ "model.layers.11.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
67
+ "model.layers.11.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
68
+ "model.layers.11.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
69
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
70
+ "model.layers.11.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
71
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
72
+ "model.layers.11.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
73
+ "model.layers.11.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
74
+ "model.layers.12.fc1.bias": "pytorch_model-00001-of-00002.bin",
75
+ "model.layers.12.fc1.weight": "pytorch_model-00001-of-00002.bin",
76
+ "model.layers.12.fc2.bias": "pytorch_model-00001-of-00002.bin",
77
+ "model.layers.12.fc2.weight": "pytorch_model-00001-of-00002.bin",
78
+ "model.layers.12.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
79
+ "model.layers.12.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
80
+ "model.layers.12.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
81
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
82
+ "model.layers.12.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
83
+ "model.layers.12.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
84
+ "model.layers.12.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
85
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
86
+ "model.layers.12.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
87
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
88
+ "model.layers.12.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
89
+ "model.layers.12.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
90
+ "model.layers.13.fc1.bias": "pytorch_model-00001-of-00002.bin",
91
+ "model.layers.13.fc1.weight": "pytorch_model-00001-of-00002.bin",
92
+ "model.layers.13.fc2.bias": "pytorch_model-00001-of-00002.bin",
93
+ "model.layers.13.fc2.weight": "pytorch_model-00001-of-00002.bin",
94
+ "model.layers.13.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
95
+ "model.layers.13.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
96
+ "model.layers.13.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
97
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
98
+ "model.layers.13.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
99
+ "model.layers.13.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
100
+ "model.layers.13.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
101
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
102
+ "model.layers.13.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
103
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
104
+ "model.layers.13.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
105
+ "model.layers.13.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
106
+ "model.layers.14.fc1.bias": "pytorch_model-00001-of-00002.bin",
107
+ "model.layers.14.fc1.weight": "pytorch_model-00001-of-00002.bin",
108
+ "model.layers.14.fc2.bias": "pytorch_model-00001-of-00002.bin",
109
+ "model.layers.14.fc2.weight": "pytorch_model-00001-of-00002.bin",
110
+ "model.layers.14.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
111
+ "model.layers.14.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
112
+ "model.layers.14.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
113
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
114
+ "model.layers.14.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
115
+ "model.layers.14.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
116
+ "model.layers.14.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
117
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
118
+ "model.layers.14.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
119
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
120
+ "model.layers.14.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
121
+ "model.layers.14.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
122
+ "model.layers.15.fc1.bias": "pytorch_model-00001-of-00002.bin",
123
+ "model.layers.15.fc1.weight": "pytorch_model-00001-of-00002.bin",
124
+ "model.layers.15.fc2.bias": "pytorch_model-00001-of-00002.bin",
125
+ "model.layers.15.fc2.weight": "pytorch_model-00001-of-00002.bin",
126
+ "model.layers.15.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
127
+ "model.layers.15.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
128
+ "model.layers.15.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
129
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
130
+ "model.layers.15.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
131
+ "model.layers.15.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
132
+ "model.layers.15.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
133
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
134
+ "model.layers.15.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
135
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
136
+ "model.layers.15.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
137
+ "model.layers.15.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
138
+ "model.layers.16.fc1.bias": "pytorch_model-00001-of-00002.bin",
139
+ "model.layers.16.fc1.weight": "pytorch_model-00001-of-00002.bin",
140
+ "model.layers.16.fc2.bias": "pytorch_model-00001-of-00002.bin",
141
+ "model.layers.16.fc2.weight": "pytorch_model-00001-of-00002.bin",
142
+ "model.layers.16.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
143
+ "model.layers.16.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
144
+ "model.layers.16.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
145
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
146
+ "model.layers.16.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
147
+ "model.layers.16.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
148
+ "model.layers.16.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
149
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
150
+ "model.layers.16.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
151
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
152
+ "model.layers.16.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
153
+ "model.layers.16.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
154
+ "model.layers.17.fc1.bias": "pytorch_model-00001-of-00002.bin",
155
+ "model.layers.17.fc1.weight": "pytorch_model-00001-of-00002.bin",
156
+ "model.layers.17.fc2.bias": "pytorch_model-00001-of-00002.bin",
157
+ "model.layers.17.fc2.weight": "pytorch_model-00001-of-00002.bin",
158
+ "model.layers.17.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
159
+ "model.layers.17.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
160
+ "model.layers.17.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
161
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
162
+ "model.layers.17.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
163
+ "model.layers.17.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
164
+ "model.layers.17.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
165
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
166
+ "model.layers.17.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
167
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
168
+ "model.layers.17.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
169
+ "model.layers.17.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
170
+ "model.layers.18.fc1.bias": "pytorch_model-00001-of-00002.bin",
171
+ "model.layers.18.fc1.weight": "pytorch_model-00001-of-00002.bin",
172
+ "model.layers.18.fc2.bias": "pytorch_model-00001-of-00002.bin",
173
+ "model.layers.18.fc2.weight": "pytorch_model-00001-of-00002.bin",
174
+ "model.layers.18.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
175
+ "model.layers.18.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
176
+ "model.layers.18.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
177
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
178
+ "model.layers.18.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
179
+ "model.layers.18.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
180
+ "model.layers.18.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
181
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
182
+ "model.layers.18.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
183
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
184
+ "model.layers.18.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
185
+ "model.layers.18.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
186
+ "model.layers.19.fc1.bias": "pytorch_model-00002-of-00002.bin",
187
+ "model.layers.19.fc1.weight": "pytorch_model-00002-of-00002.bin",
188
+ "model.layers.19.fc2.bias": "pytorch_model-00002-of-00002.bin",
189
+ "model.layers.19.fc2.weight": "pytorch_model-00002-of-00002.bin",
190
+ "model.layers.19.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
191
+ "model.layers.19.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
192
+ "model.layers.19.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
193
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
194
+ "model.layers.19.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
195
+ "model.layers.19.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
196
+ "model.layers.19.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
197
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
198
+ "model.layers.19.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
199
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
200
+ "model.layers.19.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
201
+ "model.layers.19.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
202
+ "model.layers.2.fc1.bias": "pytorch_model-00001-of-00002.bin",
203
+ "model.layers.2.fc1.weight": "pytorch_model-00001-of-00002.bin",
204
+ "model.layers.2.fc2.bias": "pytorch_model-00001-of-00002.bin",
205
+ "model.layers.2.fc2.weight": "pytorch_model-00001-of-00002.bin",
206
+ "model.layers.2.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
207
+ "model.layers.2.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
208
+ "model.layers.2.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
209
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
210
+ "model.layers.2.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
211
+ "model.layers.2.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
212
+ "model.layers.2.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
213
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
214
+ "model.layers.2.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
215
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
216
+ "model.layers.2.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
217
+ "model.layers.2.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
218
+ "model.layers.20.fc1.bias": "pytorch_model-00002-of-00002.bin",
219
+ "model.layers.20.fc1.weight": "pytorch_model-00002-of-00002.bin",
220
+ "model.layers.20.fc2.bias": "pytorch_model-00002-of-00002.bin",
221
+ "model.layers.20.fc2.weight": "pytorch_model-00002-of-00002.bin",
222
+ "model.layers.20.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
223
+ "model.layers.20.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
224
+ "model.layers.20.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
225
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
226
+ "model.layers.20.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
227
+ "model.layers.20.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
228
+ "model.layers.20.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
229
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
230
+ "model.layers.20.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
231
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
232
+ "model.layers.20.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
233
+ "model.layers.20.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
234
+ "model.layers.21.fc1.bias": "pytorch_model-00002-of-00002.bin",
235
+ "model.layers.21.fc1.weight": "pytorch_model-00002-of-00002.bin",
236
+ "model.layers.21.fc2.bias": "pytorch_model-00002-of-00002.bin",
237
+ "model.layers.21.fc2.weight": "pytorch_model-00002-of-00002.bin",
238
+ "model.layers.21.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
239
+ "model.layers.21.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
240
+ "model.layers.21.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
241
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
242
+ "model.layers.21.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
243
+ "model.layers.21.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
244
+ "model.layers.21.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
245
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
246
+ "model.layers.21.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
247
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
248
+ "model.layers.21.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
249
+ "model.layers.21.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
250
+ "model.layers.22.fc1.bias": "pytorch_model-00002-of-00002.bin",
251
+ "model.layers.22.fc1.weight": "pytorch_model-00002-of-00002.bin",
252
+ "model.layers.22.fc2.bias": "pytorch_model-00002-of-00002.bin",
253
+ "model.layers.22.fc2.weight": "pytorch_model-00002-of-00002.bin",
254
+ "model.layers.22.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
255
+ "model.layers.22.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
256
+ "model.layers.22.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
257
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
258
+ "model.layers.22.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
259
+ "model.layers.22.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
260
+ "model.layers.22.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
261
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
262
+ "model.layers.22.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
263
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
264
+ "model.layers.22.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
265
+ "model.layers.22.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
266
+ "model.layers.23.fc1.bias": "pytorch_model-00002-of-00002.bin",
267
+ "model.layers.23.fc1.weight": "pytorch_model-00002-of-00002.bin",
268
+ "model.layers.23.fc2.bias": "pytorch_model-00002-of-00002.bin",
269
+ "model.layers.23.fc2.weight": "pytorch_model-00002-of-00002.bin",
270
+ "model.layers.23.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
271
+ "model.layers.23.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
272
+ "model.layers.23.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
273
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
274
+ "model.layers.23.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
275
+ "model.layers.23.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
276
+ "model.layers.23.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
277
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
278
+ "model.layers.23.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
279
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
280
+ "model.layers.23.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
281
+ "model.layers.23.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
282
+ "model.layers.24.fc1.bias": "pytorch_model-00002-of-00002.bin",
283
+ "model.layers.24.fc1.weight": "pytorch_model-00002-of-00002.bin",
284
+ "model.layers.24.fc2.bias": "pytorch_model-00002-of-00002.bin",
285
+ "model.layers.24.fc2.weight": "pytorch_model-00002-of-00002.bin",
286
+ "model.layers.24.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
287
+ "model.layers.24.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
288
+ "model.layers.24.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
289
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
290
+ "model.layers.24.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
291
+ "model.layers.24.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
292
+ "model.layers.24.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
293
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
294
+ "model.layers.24.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
295
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
296
+ "model.layers.24.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
297
+ "model.layers.24.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
298
+ "model.layers.25.fc1.bias": "pytorch_model-00002-of-00002.bin",
299
+ "model.layers.25.fc1.weight": "pytorch_model-00002-of-00002.bin",
300
+ "model.layers.25.fc2.bias": "pytorch_model-00002-of-00002.bin",
301
+ "model.layers.25.fc2.weight": "pytorch_model-00002-of-00002.bin",
302
+ "model.layers.25.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
303
+ "model.layers.25.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
304
+ "model.layers.25.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
305
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
306
+ "model.layers.25.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
307
+ "model.layers.25.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
308
+ "model.layers.25.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
309
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
310
+ "model.layers.25.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
311
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
312
+ "model.layers.25.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
313
+ "model.layers.25.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
314
+ "model.layers.26.fc1.bias": "pytorch_model-00002-of-00002.bin",
315
+ "model.layers.26.fc1.weight": "pytorch_model-00002-of-00002.bin",
316
+ "model.layers.26.fc2.bias": "pytorch_model-00002-of-00002.bin",
317
+ "model.layers.26.fc2.weight": "pytorch_model-00002-of-00002.bin",
318
+ "model.layers.26.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
319
+ "model.layers.26.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
320
+ "model.layers.26.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
321
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
322
+ "model.layers.26.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
323
+ "model.layers.26.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
324
+ "model.layers.26.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
325
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
326
+ "model.layers.26.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
327
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
328
+ "model.layers.26.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
329
+ "model.layers.26.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
330
+ "model.layers.27.fc1.bias": "pytorch_model-00002-of-00002.bin",
331
+ "model.layers.27.fc1.weight": "pytorch_model-00002-of-00002.bin",
332
+ "model.layers.27.fc2.bias": "pytorch_model-00002-of-00002.bin",
333
+ "model.layers.27.fc2.weight": "pytorch_model-00002-of-00002.bin",
334
+ "model.layers.27.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
335
+ "model.layers.27.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
336
+ "model.layers.27.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
337
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
338
+ "model.layers.27.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
339
+ "model.layers.27.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
340
+ "model.layers.27.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
341
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
342
+ "model.layers.27.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
343
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
344
+ "model.layers.27.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
345
+ "model.layers.27.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
346
+ "model.layers.28.fc1.bias": "pytorch_model-00002-of-00002.bin",
347
+ "model.layers.28.fc1.weight": "pytorch_model-00002-of-00002.bin",
348
+ "model.layers.28.fc2.bias": "pytorch_model-00002-of-00002.bin",
349
+ "model.layers.28.fc2.weight": "pytorch_model-00002-of-00002.bin",
350
+ "model.layers.28.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
351
+ "model.layers.28.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
352
+ "model.layers.28.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
353
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
354
+ "model.layers.28.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
355
+ "model.layers.28.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
356
+ "model.layers.28.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
357
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
358
+ "model.layers.28.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
359
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
360
+ "model.layers.28.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
361
+ "model.layers.28.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
362
+ "model.layers.29.fc1.bias": "pytorch_model-00002-of-00002.bin",
363
+ "model.layers.29.fc1.weight": "pytorch_model-00002-of-00002.bin",
364
+ "model.layers.29.fc2.bias": "pytorch_model-00002-of-00002.bin",
365
+ "model.layers.29.fc2.weight": "pytorch_model-00002-of-00002.bin",
366
+ "model.layers.29.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
367
+ "model.layers.29.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
368
+ "model.layers.29.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
369
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
370
+ "model.layers.29.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
371
+ "model.layers.29.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
372
+ "model.layers.29.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
373
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
374
+ "model.layers.29.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
375
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
376
+ "model.layers.29.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
377
+ "model.layers.29.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
378
+ "model.layers.3.fc1.bias": "pytorch_model-00001-of-00002.bin",
379
+ "model.layers.3.fc1.weight": "pytorch_model-00001-of-00002.bin",
380
+ "model.layers.3.fc2.bias": "pytorch_model-00001-of-00002.bin",
381
+ "model.layers.3.fc2.weight": "pytorch_model-00001-of-00002.bin",
382
+ "model.layers.3.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
383
+ "model.layers.3.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
384
+ "model.layers.3.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
385
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
386
+ "model.layers.3.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
387
+ "model.layers.3.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
388
+ "model.layers.3.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
389
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
390
+ "model.layers.3.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
391
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
392
+ "model.layers.3.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
393
+ "model.layers.3.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
394
+ "model.layers.30.fc1.bias": "pytorch_model-00002-of-00002.bin",
395
+ "model.layers.30.fc1.weight": "pytorch_model-00002-of-00002.bin",
396
+ "model.layers.30.fc2.bias": "pytorch_model-00002-of-00002.bin",
397
+ "model.layers.30.fc2.weight": "pytorch_model-00002-of-00002.bin",
398
+ "model.layers.30.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
399
+ "model.layers.30.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
400
+ "model.layers.30.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
401
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
402
+ "model.layers.30.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
403
+ "model.layers.30.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
404
+ "model.layers.30.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
405
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
406
+ "model.layers.30.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
407
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
408
+ "model.layers.30.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
409
+ "model.layers.30.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
410
+ "model.layers.31.fc1.bias": "pytorch_model-00002-of-00002.bin",
411
+ "model.layers.31.fc1.weight": "pytorch_model-00002-of-00002.bin",
412
+ "model.layers.31.fc2.bias": "pytorch_model-00002-of-00002.bin",
413
+ "model.layers.31.fc2.weight": "pytorch_model-00002-of-00002.bin",
414
+ "model.layers.31.final_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
415
+ "model.layers.31.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
416
+ "model.layers.31.self_attn.k_proj.bias": "pytorch_model-00002-of-00002.bin",
417
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
418
+ "model.layers.31.self_attn.out_proj.bias": "pytorch_model-00002-of-00002.bin",
419
+ "model.layers.31.self_attn.out_proj.weight": "pytorch_model-00002-of-00002.bin",
420
+ "model.layers.31.self_attn.q_proj.bias": "pytorch_model-00002-of-00002.bin",
421
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
422
+ "model.layers.31.self_attn.v_proj.bias": "pytorch_model-00002-of-00002.bin",
423
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
424
+ "model.layers.31.self_attn_layer_norm.bias": "pytorch_model-00002-of-00002.bin",
425
+ "model.layers.31.self_attn_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
426
+ "model.layers.4.fc1.bias": "pytorch_model-00001-of-00002.bin",
427
+ "model.layers.4.fc1.weight": "pytorch_model-00001-of-00002.bin",
428
+ "model.layers.4.fc2.bias": "pytorch_model-00001-of-00002.bin",
429
+ "model.layers.4.fc2.weight": "pytorch_model-00001-of-00002.bin",
430
+ "model.layers.4.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
431
+ "model.layers.4.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
432
+ "model.layers.4.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
433
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
434
+ "model.layers.4.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
435
+ "model.layers.4.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
436
+ "model.layers.4.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
437
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
438
+ "model.layers.4.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
439
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
440
+ "model.layers.4.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
441
+ "model.layers.4.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
442
+ "model.layers.5.fc1.bias": "pytorch_model-00001-of-00002.bin",
443
+ "model.layers.5.fc1.weight": "pytorch_model-00001-of-00002.bin",
444
+ "model.layers.5.fc2.bias": "pytorch_model-00001-of-00002.bin",
445
+ "model.layers.5.fc2.weight": "pytorch_model-00001-of-00002.bin",
446
+ "model.layers.5.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
447
+ "model.layers.5.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
448
+ "model.layers.5.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
449
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
450
+ "model.layers.5.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
451
+ "model.layers.5.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
452
+ "model.layers.5.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
453
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
454
+ "model.layers.5.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
455
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
456
+ "model.layers.5.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
457
+ "model.layers.5.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
458
+ "model.layers.6.fc1.bias": "pytorch_model-00001-of-00002.bin",
459
+ "model.layers.6.fc1.weight": "pytorch_model-00001-of-00002.bin",
460
+ "model.layers.6.fc2.bias": "pytorch_model-00001-of-00002.bin",
461
+ "model.layers.6.fc2.weight": "pytorch_model-00001-of-00002.bin",
462
+ "model.layers.6.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
463
+ "model.layers.6.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
464
+ "model.layers.6.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
465
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
466
+ "model.layers.6.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
467
+ "model.layers.6.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
468
+ "model.layers.6.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
469
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
470
+ "model.layers.6.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
471
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
472
+ "model.layers.6.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
473
+ "model.layers.6.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
474
+ "model.layers.7.fc1.bias": "pytorch_model-00001-of-00002.bin",
475
+ "model.layers.7.fc1.weight": "pytorch_model-00001-of-00002.bin",
476
+ "model.layers.7.fc2.bias": "pytorch_model-00001-of-00002.bin",
477
+ "model.layers.7.fc2.weight": "pytorch_model-00001-of-00002.bin",
478
+ "model.layers.7.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
479
+ "model.layers.7.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
480
+ "model.layers.7.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
481
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
482
+ "model.layers.7.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
483
+ "model.layers.7.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
484
+ "model.layers.7.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
485
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
486
+ "model.layers.7.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
487
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
488
+ "model.layers.7.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
489
+ "model.layers.7.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
490
+ "model.layers.8.fc1.bias": "pytorch_model-00001-of-00002.bin",
491
+ "model.layers.8.fc1.weight": "pytorch_model-00001-of-00002.bin",
492
+ "model.layers.8.fc2.bias": "pytorch_model-00001-of-00002.bin",
493
+ "model.layers.8.fc2.weight": "pytorch_model-00001-of-00002.bin",
494
+ "model.layers.8.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
495
+ "model.layers.8.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
496
+ "model.layers.8.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
497
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
498
+ "model.layers.8.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
499
+ "model.layers.8.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
500
+ "model.layers.8.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
501
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
502
+ "model.layers.8.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
503
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
504
+ "model.layers.8.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
505
+ "model.layers.8.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
506
+ "model.layers.9.fc1.bias": "pytorch_model-00001-of-00002.bin",
507
+ "model.layers.9.fc1.weight": "pytorch_model-00001-of-00002.bin",
508
+ "model.layers.9.fc2.bias": "pytorch_model-00001-of-00002.bin",
509
+ "model.layers.9.fc2.weight": "pytorch_model-00001-of-00002.bin",
510
+ "model.layers.9.final_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
511
+ "model.layers.9.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
512
+ "model.layers.9.self_attn.k_proj.bias": "pytorch_model-00001-of-00002.bin",
513
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
514
+ "model.layers.9.self_attn.out_proj.bias": "pytorch_model-00001-of-00002.bin",
515
+ "model.layers.9.self_attn.out_proj.weight": "pytorch_model-00001-of-00002.bin",
516
+ "model.layers.9.self_attn.q_proj.bias": "pytorch_model-00001-of-00002.bin",
517
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
518
+ "model.layers.9.self_attn.v_proj.bias": "pytorch_model-00001-of-00002.bin",
519
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
520
+ "model.layers.9.self_attn_layer_norm.bias": "pytorch_model-00001-of-00002.bin",
521
+ "model.layers.9.self_attn_layer_norm.weight": "pytorch_model-00001-of-00002.bin"
522
+ }
523
+ }
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c49dc7e82c10227af764e518924cf2f9d50c00462750d184fa74697bba65eef8
3
+ size 4920706
special_tokens_map.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<madeupword0>",
4
+ "<madeupword1>",
5
+ "<madeupword2>",
6
+ "<madeupword3>",
7
+ "<madeupword4>",
8
+ "<madeupword5>",
9
+ "<madeupword6>"
10
+ ],
11
+ "bos_token": "<s>",
12
+ "cls_token": "<s>",
13
+ "eos_token": "</s>",
14
+ "pad_token": "<pad>",
15
+ "sep_token": "</s>",
16
+ "unk_token": "<unk>"
17
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00d163342a36b3ad1ea2f5f608e6bb2b2ff29bd453a41c4f52525a7ebc7c4b6a
3
+ size 17210041
tokenizer_config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<madeupword0>",
4
+ "<madeupword1>",
5
+ "<madeupword2>",
6
+ "<madeupword3>",
7
+ "<madeupword4>",
8
+ "<madeupword5>",
9
+ "<madeupword6>"
10
+ ],
11
+ "bos_token": "<s>",
12
+ "cls_token": "<s>",
13
+ "eos_token": "</s>",
14
+ "model_max_length": 1000000000000000019884624838656,
15
+ "pad_token": "<pad>",
16
+ "sep_token": "</s>",
17
+ "sp_model_kwargs": {},
18
+ "special_tokens_map_file": "hf_models/xglm-564M/special_tokens_map.json",
19
+ "tokenizer_class": "XGLMTokenizer",
20
+ "unk_token": "<unk>"
21
+ }