ahmetmete

ArthurZ HF Staff commited on Dec 6, 2023

Commit

e18a373

0 Parent(s):

Duplicate from google/flan-t5-xl

Browse files

Co-authored-by: Arthur Zucker <ArthurZ@users.noreply.huggingface.co>

Files changed (20) hide show

.gitattributes +33 -0
README.md +276 -0
config.json +58 -0
flax_model-00001-of-00002.msgpack +3 -0
flax_model-00002-of-00002.msgpack +3 -0
flax_model.msgpack.index.json +565 -0
generation_config.json +7 -0
model-00001-of-00002.safetensors +3 -0
model-00002-of-00002.safetensors +3 -0
model.safetensors.index.json +567 -0
pytorch_model-00001-of-00002.bin +3 -0
pytorch_model-00002-of-00002.bin +3 -0
pytorch_model.bin.index.json +567 -0
special_tokens_map.json +107 -0
spiece.model +3 -0
tf_model-00001-of-00002.h5 +3 -0
tf_model-00002-of-00002.h5 +3 -0
tf_model.h5.index.json +565 -0
tokenizer.json +0 -0
tokenizer_config.json +113 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,33 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,276 @@

+---
+language:
+- en
+- fr
+- ro
+- de
+- multilingual
+widget:
+- text: "Translate to German:  My name is Arthur"
+  example_title: "Translation"
+- text: "Please answer to the following question. Who is going to be the next Ballon d'or?"
+  example_title: "Question Answering"
+- text: "Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering."
+  example_title: "Logical reasoning"
+- text: "Please answer the following question. What is the boiling point of Nitrogen?"
+  example_title: "Scientific knowledge"
+- text: "Answer the following yes/no question. Can you write a whole Haiku in a single tweet?"
+  example_title: "Yes/no question"
+- text: "Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?"
+  example_title: "Reasoning task"
+- text: "Q: ( False or not False or False ) is? A: Let's think step by step"
+  example_title: "Boolean Expressions"
+- text: "The square root of x is the cube root of y. What is y to the power of 2, if x = 4?"
+  example_title: "Math reasoning"
+- text: "Premise:  At my age you will probably have learnt one lesson. Hypothesis:  It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?"
+  example_title: "Premise and hypothesis"
+tags:
+- text2text-generation
+datasets:
+- svakulenk0/qrecc
+- taskmaster2
+- djaym7/wiki_dialog
+- deepmind/code_contests
+- lambada
+- gsm8k
+- aqua_rat
+- esnli
+- quasc
+- qed
+license: apache-2.0
+---
+# Model Card for FLAN-T5 XL
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/flan2_architecture.jpg"
+alt="drawing" width="600"/>
+#  Table of Contents
+0. [TL;DR](#TL;DR)
+1. [Model Details](#model-details)
+2. [Usage](#usage)
+3. [Uses](#uses)
+4. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
+5. [Training Details](#training-details)
+6. [Evaluation](#evaluation)
+7. [Environmental Impact](#environmental-impact)
+8. [Citation](#citation)
+# TL;DR
+If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages.
+As mentioned in the first few lines of the abstract :
+>  Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.
+**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).
+# Model Details
+## Model Description
+- **Model type:** Language model
+- **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian
+- **License:** Apache 2.0
+- **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)
+- **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)
+- **Resources for more information:**
+  - [Research paper](https://arxiv.org/pdf/2210.11416.pdf)
+  - [GitHub Repo](https://github.com/google-research/t5x)
+  - [Hugging Face FLAN-T5 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/t5)
+# Usage
+Find below some example scripts on how to use the model in `transformers`:
+## Using the Pytorch model
+### Running the model on a CPU
+<details>
+<summary> Click to expand </summary>
+```python
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
+model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl")
+input_text = "translate English to German: How old are you?"
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+</details>
+### Running the model on a GPU
+<details>
+<summary> Click to expand </summary>
+```python
+# pip install accelerate
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
+model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto")
+input_text = "translate English to German: How old are you?"
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+</details>
+### Running the model on a GPU using different precisions
+#### FP16
+<details>
+<summary> Click to expand </summary>
+```python
+# pip install accelerate
+import torch
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
+model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto", torch_dtype=torch.float16)
+input_text = "translate English to German: How old are you?"
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+</details>
+#### INT8
+<details>
+<summary> Click to expand </summary>
+```python
+# pip install bitsandbytes accelerate
+from transformers import T5Tokenizer, T5ForConditionalGeneration
+tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
+model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto", load_in_8bit=True)
+input_text = "translate English to German: How old are you?"
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+</details>
+# Uses
+## Direct Use and Downstream Use
+The authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that:
+> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models
+See the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.
+## Out-of-Scope Use
+More information needed.
+# Bias, Risks, and Limitations
+The information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):
+> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.
+## Ethical considerations and risks
+> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.
+## Known Limitations
+> Flan-T5 has not been tested in real world applications.
+## Sensitive Use:
+> Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
+# Training Details
+## Training Data
+The model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2):
+![table.png](https://s3.amazonaws.com/moonup/production/uploads/1666363265279-62441d1d9fdefb55a0b7d12c.png)
+## Training Procedure
+According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
+> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.
+The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
+# Evaluation
+## Testing Data, Factors & Metrics
+The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation:
+![image.png](https://s3.amazonaws.com/moonup/production/uploads/1668072995230-62441d1d9fdefb55a0b7d12c.png)
+For full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).
+## Results
+For full results for FLAN-T5-XL, see the [research paper](https://arxiv.org/pdf/2210.11416.pdf), Table 3.
+# Environmental Impact
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4  | Number of chips ≥ 4.
+- **Hours used:** More information needed
+- **Cloud Provider:** GCP
+- **Compute Region:** More information needed
+- **Carbon Emitted:** More information needed
+# Citation
+**BibTeX:**
+```bibtex
+@misc{https://doi.org/10.48550/arxiv.2210.11416,
+  doi = {10.48550/ARXIV.2210.11416},
+  url = {https://arxiv.org/abs/2210.11416},
+  author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},
+  keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
+  title = {Scaling Instruction-Finetuned Language Models},
+  publisher = {arXiv},
+  year = {2022},
+  copyright = {Creative Commons Attribution 4.0 International}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "architectures": [
+    "T5ForConditionalGeneration"
+  ],
+  "d_ff": 5120,
+  "d_kv": 64,
+  "d_model": 2048,
+  "decoder_start_token_id": 0,
+  "dropout_rate": 0.1,
+  "eos_token_id": 1,
+  "feed_forward_proj": "gated-gelu",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "t5",
+  "n_positions": 512,
+  "num_decoder_layers": 24,
+  "num_heads": 32,
+  "num_layers": 24,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_max_distance": 128,
+  "relative_attention_num_buckets": 32,
+  "task_specific_params": {
+    "summarization": {
+      "early_stopping": true,
+      "length_penalty": 2.0,
+      "max_length": 200,
+      "min_length": 30,
+      "no_repeat_ngram_size": 3,
+      "num_beams": 4,
+      "prefix": "summarize: "
+    },
+    "translation_en_to_de": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to German: "
+    },
+    "translation_en_to_fr": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to French: "
+    },
+    "translation_en_to_ro": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to Romanian: "
+    }
+  },
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.24.0.dev0",
+  "use_cache": true,
+  "vocab_size": 32128
+}

flax_model-00001-of-00002.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:32309ed3c4c2c2dd040f9f2adc116f9cbb1ad180fe323d7d8b31153cb85ff8a0
+size 9969726387

flax_model-00002-of-00002.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e4ab8e7cf8238bc7eff23d1f6f47a7f67cda1c506e0f4b041329625f636e6d6b
+size 1429326464

flax_model.msgpack.index.json ADDED Viewed

	@@ -0,0 +1,565 @@

+{
+  "metadata": {
+    "total_size": 11399028736
+  },
+  "weight_map": {
+    "decoder/block/0/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/0/SelfAttention/relative_attention_bias/embedding": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/0/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/1/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/10/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/11/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/12/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/13/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/14/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/15/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/16/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/17/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/18/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/19/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/19/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/19/layer/2/DenseReluDense/wo/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/19/layer/2/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/2/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/2/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/20/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/0/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/1/EncDecAttention/k/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/1/EncDecAttention/o/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/1/EncDecAttention/q/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/1/EncDecAttention/v/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/1/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/2/DenseReluDense/wo/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/20/layer/2/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/0/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/1/EncDecAttention/k/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/1/EncDecAttention/o/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/1/EncDecAttention/q/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/1/EncDecAttention/v/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/1/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/2/DenseReluDense/wo/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/21/layer/2/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/0/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/1/EncDecAttention/k/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/1/EncDecAttention/o/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/1/EncDecAttention/q/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/1/EncDecAttention/v/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/1/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/2/DenseReluDense/wo/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/22/layer/2/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/0/SelfAttention/k/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/0/SelfAttention/o/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/0/SelfAttention/q/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/0/SelfAttention/v/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/0/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/1/EncDecAttention/k/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/1/EncDecAttention/o/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/1/EncDecAttention/q/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/1/EncDecAttention/v/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/1/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/2/DenseReluDense/wo/kernel": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/23/layer/2/layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "decoder/block/3/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/3/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/4/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/5/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/6/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/7/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/8/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/1/EncDecAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/1/EncDecAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/1/EncDecAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/1/EncDecAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/2/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/2/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/2/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "decoder/block/9/layer/2/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "decoder/final_layer_norm/weight": "flax_model-00002-of-00002.msgpack",
+    "encoder/block/0/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/0/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/0/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/0/layer/0/SelfAttention/relative_attention_bias/embedding": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/0/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/0/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/0/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/0/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/0/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/0/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/1/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/1/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/1/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/1/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/1/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/1/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/1/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/1/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/1/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/10/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/10/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/10/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/10/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/10/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/10/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/10/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/10/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/10/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/11/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/11/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/11/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/11/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/11/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/11/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/11/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/11/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/11/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/12/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/12/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/12/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/12/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/12/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/12/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/12/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/12/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/12/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/13/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/13/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/13/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/13/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/13/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/13/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/13/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/13/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/13/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/14/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/14/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/14/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/14/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/14/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/14/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/14/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/14/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/14/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/15/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/15/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/15/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/15/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/15/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/15/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/15/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/15/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/15/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/16/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/16/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/16/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/16/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/16/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/16/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/16/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/16/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/16/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/17/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/17/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/17/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/17/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/17/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/17/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/17/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/17/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/17/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/18/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/18/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/18/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/18/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/18/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/18/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/18/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/18/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/18/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/19/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/19/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/19/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/19/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/19/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/19/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/19/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/19/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/19/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/2/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/2/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/2/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/2/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/2/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/2/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/2/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/2/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/2/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/20/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/20/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/20/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/20/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/20/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/20/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/20/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/20/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/20/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/21/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/21/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/21/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/21/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/21/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/21/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/21/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/21/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/21/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/22/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/22/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/22/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/22/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/22/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/22/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/22/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/22/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/22/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/23/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/23/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/23/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/23/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/23/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/23/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/23/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/23/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/23/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/3/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/3/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/3/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/3/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/3/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/3/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/3/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/3/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/3/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/4/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/4/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/4/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/4/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/4/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/4/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/4/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/4/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/4/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/5/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/5/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/5/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/5/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/5/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/5/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/5/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/5/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/5/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/6/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/6/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/6/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/6/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/6/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/6/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/6/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/6/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/6/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/7/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/7/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/7/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/7/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/7/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/7/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/7/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/7/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/7/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/8/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/8/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/8/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/8/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/8/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/8/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/8/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/8/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/8/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/9/layer/0/SelfAttention/k/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/9/layer/0/SelfAttention/o/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/9/layer/0/SelfAttention/q/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/9/layer/0/SelfAttention/v/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/9/layer/0/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/9/layer/1/DenseReluDense/wi_0/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/9/layer/1/DenseReluDense/wi_1/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/9/layer/1/DenseReluDense/wo/kernel": "flax_model-00001-of-00002.msgpack",
+    "encoder/block/9/layer/1/layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "encoder/final_layer_norm/weight": "flax_model-00001-of-00002.msgpack",
+    "lm_head/kernel": "flax_model-00002-of-00002.msgpack",
+    "shared/embedding": "flax_model-00001-of-00002.msgpack"
+  }
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "decoder_start_token_id": 0,
+  "eos_token_id": 1,
+  "pad_token_id": 0,
+  "transformers_version": "4.27.0.dev0"
+}

model-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:99196ddfbe886e8ef860f52de979df64890edfc792c3d94ce0502991f347dd18
+size 9449619912

model-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c0c677ddeb21009b6efd97146f37fc3a0396707fb5e63ade7aff64884dce9806
+size 1949477672

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,567 @@

+{
+    "metadata": {
+        "total_size": 11925413888
+    },
+    "weight_map": {
+        "decoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.0.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.1.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.10.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.11.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.12.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.13.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.14.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.15.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.16.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.17.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.17.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.17.layer.2.DenseReluDense.wo.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.17.layer.2.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.0.SelfAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.0.SelfAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.0.SelfAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.0.SelfAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.0.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.1.EncDecAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.1.EncDecAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.1.EncDecAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.1.EncDecAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.1.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.2.DenseReluDense.wo.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.18.layer.2.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.0.SelfAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.0.SelfAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.0.SelfAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.0.SelfAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.0.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.1.EncDecAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.1.EncDecAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.1.EncDecAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.1.EncDecAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.1.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.2.DenseReluDense.wo.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.19.layer.2.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.2.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.2.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.20.layer.0.SelfAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.0.SelfAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.0.SelfAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.0.SelfAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.0.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.1.EncDecAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.1.EncDecAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.1.EncDecAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.1.EncDecAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.1.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.2.DenseReluDense.wo.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.20.layer.2.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.0.SelfAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.0.SelfAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.0.SelfAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.0.SelfAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.0.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.1.EncDecAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.1.EncDecAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.1.EncDecAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.1.EncDecAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.1.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.2.DenseReluDense.wo.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.21.layer.2.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.0.SelfAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.0.SelfAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.0.SelfAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.0.SelfAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.0.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.1.EncDecAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.1.EncDecAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.1.EncDecAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.1.EncDecAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.1.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.2.DenseReluDense.wo.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.22.layer.2.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.0.SelfAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.0.SelfAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.0.SelfAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.0.SelfAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.0.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.1.EncDecAttention.k.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.1.EncDecAttention.o.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.1.EncDecAttention.q.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.1.EncDecAttention.v.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.1.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.2.DenseReluDense.wi_0.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.2.DenseReluDense.wi_1.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.2.DenseReluDense.wo.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.23.layer.2.layer_norm.weight": "model-00002-of-00002.safetensors",
+        "decoder.block.3.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.3.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.4.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.5.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.6.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.7.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.8.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.1.EncDecAttention.k.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.1.EncDecAttention.o.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.1.EncDecAttention.q.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.1.EncDecAttention.v.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.2.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.2.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.2.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "decoder.block.9.layer.2.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "decoder.embed_tokens.weight": "model-00001-of-00002.safetensors",
+        "decoder.final_layer_norm.weight": "model-00002-of-00002.safetensors",
+        "encoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.0.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.0.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.0.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.0.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.1.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.1.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.1.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.1.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.1.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.1.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.1.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.1.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.1.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.10.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.10.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.10.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.10.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.10.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.10.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.10.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.10.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.10.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.11.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.11.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.11.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.11.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.11.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.11.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.11.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.11.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.11.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.12.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.12.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.12.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.12.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.12.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.12.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.12.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.12.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.12.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.13.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.13.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.13.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.13.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.13.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.13.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.13.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.13.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.13.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.14.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.14.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.14.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.14.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.14.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.14.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.14.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.14.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.14.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.15.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.15.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.15.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.15.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.15.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.15.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.15.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.15.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.15.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.16.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.16.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.16.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.16.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.16.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.16.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.16.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.16.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.16.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.17.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.17.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.17.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.17.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.17.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.17.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.17.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.17.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.17.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.18.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.18.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.18.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.18.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.18.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.18.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.18.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.18.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.18.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.19.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.19.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.19.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.19.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.19.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.19.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.19.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.19.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.19.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.2.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.2.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.2.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.2.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.2.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.2.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.2.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.2.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.2.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.20.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.20.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.20.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.20.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.20.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.20.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.20.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.20.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.20.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.21.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.21.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.21.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.21.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.21.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.21.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.21.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.21.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.21.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.22.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.22.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.22.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.22.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.22.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.22.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.22.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.22.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.22.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.23.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.23.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.23.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.23.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.23.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.23.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.23.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.23.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.23.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.3.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.3.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.3.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.3.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.3.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.3.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.3.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.3.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.3.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.4.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.4.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.4.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.4.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.4.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.4.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.4.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.4.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.4.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.5.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.5.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.5.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.5.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.5.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.5.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.5.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.5.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.5.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.6.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.6.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.6.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.6.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.6.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.6.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.6.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.6.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.6.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.7.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.7.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.7.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.7.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.7.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.7.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.7.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.7.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.7.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.8.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.8.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.8.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.8.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.8.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.8.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.8.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.8.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.8.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.9.layer.0.SelfAttention.k.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.9.layer.0.SelfAttention.o.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.9.layer.0.SelfAttention.q.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.9.layer.0.SelfAttention.v.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.9.layer.0.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.9.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.9.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.9.layer.1.DenseReluDense.wo.weight": "model-00001-of-00002.safetensors",
+        "encoder.block.9.layer.1.layer_norm.weight": "model-00001-of-00002.safetensors",
+        "encoder.embed_tokens.weight": "model-00001-of-00002.safetensors",
+        "encoder.final_layer_norm.weight": "model-00001-of-00002.safetensors",
+        "lm_head.weight": "model-00002-of-00002.safetensors",
+        "shared.weight": "model-00001-of-00002.safetensors"
+    }
+}

pytorch_model-00001-of-00002.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c43a96461e48c0218de822525f8ecd12a8318ed01364cb1ca595e9a238fa49d2
+size 9449717937

pytorch_model-00002-of-00002.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:60d971db8f7f7c6ccd566267e27ba7915d77dded5b3d755fedbb74ba157c5eb1
+size 1949494999

pytorch_model.bin.index.json ADDED Viewed

	@@ -0,0 +1,567 @@

+{
+  "metadata": {
+    "total_size": 11925413888
+  },
+  "weight_map": {
+    "decoder.block.0.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.0.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.1.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.10.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.11.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.12.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.13.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.14.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.15.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.16.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.17.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.17.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.17.layer.2.DenseReluDense.wo.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.17.layer.2.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.0.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.1.EncDecAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.1.EncDecAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.1.EncDecAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.1.EncDecAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.1.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.2.DenseReluDense.wo.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.18.layer.2.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.0.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.1.EncDecAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.1.EncDecAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.1.EncDecAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.1.EncDecAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.1.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.2.DenseReluDense.wo.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.19.layer.2.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.2.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.2.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.20.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.0.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.1.EncDecAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.1.EncDecAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.1.EncDecAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.1.EncDecAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.1.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.2.DenseReluDense.wo.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.20.layer.2.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.0.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.1.EncDecAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.1.EncDecAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.1.EncDecAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.1.EncDecAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.1.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.2.DenseReluDense.wo.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.21.layer.2.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.0.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.1.EncDecAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.1.EncDecAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.1.EncDecAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.1.EncDecAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.1.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.2.DenseReluDense.wo.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.22.layer.2.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.0.SelfAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.0.SelfAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.0.SelfAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.0.SelfAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.0.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.1.EncDecAttention.k.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.1.EncDecAttention.o.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.1.EncDecAttention.q.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.1.EncDecAttention.v.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.1.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.2.DenseReluDense.wo.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.23.layer.2.layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "decoder.block.3.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.3.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.4.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.5.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.6.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.7.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.8.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.1.EncDecAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.1.EncDecAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.1.EncDecAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.1.EncDecAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.2.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.2.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.2.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.block.9.layer.2.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.embed_tokens.weight": "pytorch_model-00001-of-00002.bin",
+    "decoder.final_layer_norm.weight": "pytorch_model-00002-of-00002.bin",
+    "encoder.block.0.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.0.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.0.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.0.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.0.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.0.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.0.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.0.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.0.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.1.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.1.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.1.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.1.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.1.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.1.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.1.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.1.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.1.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.10.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.10.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.10.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.10.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.10.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.10.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.10.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.10.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.10.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.11.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.11.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.11.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.11.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.11.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.11.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.11.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.11.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.11.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.12.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.12.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.12.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.12.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.12.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.12.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.12.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.12.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.12.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.13.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.13.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.13.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.13.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.13.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.13.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.13.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.13.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.13.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.14.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.14.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.14.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.14.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.14.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.14.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.14.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.14.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.14.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.15.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.15.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.15.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.15.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.15.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.15.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.15.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.15.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.15.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.16.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.16.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.16.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.16.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.16.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.16.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.16.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.16.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.16.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.17.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.17.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.17.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.17.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.17.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.17.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.17.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.17.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.17.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.18.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.18.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.18.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.18.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.18.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.18.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.18.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.18.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.18.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.19.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.19.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.19.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.19.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.19.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.19.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.19.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.19.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.19.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.2.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.2.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.2.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.2.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.2.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.2.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.2.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.2.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.2.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.20.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.20.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.20.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.20.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.20.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.20.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.20.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.20.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.20.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.21.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.21.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.21.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.21.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.21.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.21.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.21.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.21.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.21.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.22.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.22.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.22.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.22.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.22.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.22.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.22.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.22.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.22.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.23.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.23.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.23.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.23.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.23.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.23.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.23.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.23.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.23.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.3.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.3.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.3.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.3.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.3.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.3.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.3.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.3.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.3.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.4.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.4.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.4.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.4.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.4.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.4.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.4.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.4.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.4.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.5.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.5.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.5.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.5.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.5.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.5.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.5.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.5.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.5.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.6.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.6.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.6.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.6.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.6.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.6.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.6.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.6.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.6.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.7.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.7.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.7.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.7.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.7.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.7.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.7.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.7.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.7.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.8.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.8.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.8.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.8.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.8.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.8.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.8.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.8.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.8.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.9.layer.0.SelfAttention.k.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.9.layer.0.SelfAttention.o.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.9.layer.0.SelfAttention.q.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.9.layer.0.SelfAttention.v.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.9.layer.0.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.9.layer.1.DenseReluDense.wi_0.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.9.layer.1.DenseReluDense.wi_1.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.9.layer.1.DenseReluDense.wo.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.block.9.layer.1.layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.embed_tokens.weight": "pytorch_model-00001-of-00002.bin",
+    "encoder.final_layer_norm.weight": "pytorch_model-00001-of-00002.bin",
+    "lm_head.weight": "pytorch_model-00002-of-00002.bin",
+    "shared.weight": "pytorch_model-00001-of-00002.bin"
+  }
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
+size 791656

tf_model-00001-of-00002.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:22e2ed0ca1e11ec66d512f02decb37fe5faf35778f675b436502bf0edc302922
+size 9970701536

tf_model-00002-of-00002.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:62647226929ac416abc356c3a78cf31fc78a0d11f7519d27f1d8eab3c04e43fc
+size 1429448928

tf_model.h5.index.json ADDED Viewed

	@@ -0,0 +1,565 @@

+{
+  "metadata": {
+    "total_size": 11399028736
+  },
+  "weight_map": {
+    "shared/shared/embeddings:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/relative_attention_bias/embeddings:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._0/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._1/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._10/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._11/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._12/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._13/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._14/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._15/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._16/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._17/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._18/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._19/layer_._2/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._2/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._20/layer_._2/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._21/layer_._2/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._22/layer_._2/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._0/SelfAttention/k/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._0/SelfAttention/o/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._0/SelfAttention/q/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._0/SelfAttention/v/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._0/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._1/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._23/layer_._2/layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._3/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._4/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._5/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._6/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._7/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._8/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._1/EncDecAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._2/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._2/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._2/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/block_._9/layer_._2/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/decoder/final_layer_norm/weight:0": "tf_model-00002-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/relative_attention_bias/embeddings:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._0/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._0/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._0/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._0/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._0/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._0/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._1/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._1/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._1/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._1/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._1/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._1/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._10/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._10/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._10/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._10/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._10/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._10/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._11/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._11/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._11/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._11/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._11/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._11/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._12/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._12/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._12/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._12/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._12/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._12/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._12/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._12/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._12/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._13/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._13/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._13/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._13/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._13/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._13/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._13/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._13/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._13/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._14/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._14/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._14/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._14/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._14/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._14/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._14/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._14/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._14/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._15/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._15/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._15/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._15/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._15/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._15/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._15/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._15/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._15/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._16/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._16/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._16/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._16/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._16/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._16/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._16/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._16/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._16/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._17/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._17/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._17/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._17/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._17/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._17/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._17/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._17/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._17/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._18/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._18/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._18/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._18/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._18/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._18/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._18/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._18/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._18/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._19/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._19/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._19/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._19/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._19/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._19/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._19/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._19/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._19/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._2/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._2/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._2/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._2/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._2/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._2/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._20/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._20/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._20/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._20/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._20/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._20/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._20/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._20/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._20/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._21/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._21/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._21/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._21/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._21/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._21/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._21/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._21/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._21/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._22/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._22/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._22/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._22/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._22/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._22/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._22/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._22/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._22/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._23/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._23/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._23/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._23/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._23/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._23/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._23/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._23/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._23/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._3/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._3/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._3/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._3/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._3/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._3/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._4/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._4/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._4/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._4/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._4/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._4/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._5/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._5/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._5/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._5/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._5/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._5/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._6/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._6/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._6/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._6/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._6/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._6/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._7/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._7/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._7/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._7/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._7/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._7/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._8/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._8/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._8/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._8/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._8/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._8/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/k/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/o/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/q/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._9/layer_._0/SelfAttention/v/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._9/layer_._0/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._9/layer_._1/DenseReluDense/wi_0/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._9/layer_._1/DenseReluDense/wi_1/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._9/layer_._1/DenseReluDense/wo/kernel:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/block_._9/layer_._1/layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/encoder/final_layer_norm/weight:0": "tf_model-00001-of-00002.h5",
+    "tft5_for_conditional_generation/lm_head/kernel:0": "tf_model-00002-of-00002.h5"
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,113 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "extra_ids": 100,
+  "model_max_length": 512,
+  "name_or_path": "google/t5-v1_1-small",
+  "pad_token": "<pad>",
+  "sp_model_kwargs": {},
+  "special_tokens_map_file": "/home/arthur_huggingface_co/.cache/huggingface/hub/models--google--t5-v1_1-small/snapshots/fb7e6cba609f7bab11c614294bc04f82f613c7b1/special_tokens_map.json",
+  "tokenizer_class": "T5Tokenizer",
+  "unk_token": "<unk>"
+}