ybelkada commited on Aug 12, 2022

Commit

e347481

1 Parent(s): 8440fd6

add first files

Browse files

Files changed (20) hide show

README.md +183 -0
config.json +51 -0
pytorch_model.bin.index.json +517 -0
pytorch_model_00001-of-00015.bin +3 -0
pytorch_model_00002-of-00015.bin +3 -0
pytorch_model_00003-of-00015.bin +3 -0
pytorch_model_00004-of-00015.bin +3 -0
pytorch_model_00005-of-00015.bin +3 -0
pytorch_model_00006-of-00015.bin +3 -0
pytorch_model_00007-of-00015.bin +3 -0
pytorch_model_00008-of-00015.bin +3 -0
pytorch_model_00009-of-00015.bin +3 -0
pytorch_model_00010-of-00015.bin +3 -0
pytorch_model_00011-of-00015.bin +3 -0
pytorch_model_00012-of-00015.bin +3 -0
pytorch_model_00013-of-00015.bin +3 -0
pytorch_model_00014-of-00015.bin +3 -0
pytorch_model_00015-of-00015.bin +3 -0
spiece.model +3 -0
tokenizer.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,183 @@

+---
+language:
+- en
+- fr
+- ro
+- de
+datasets:
+- c4
+tags:
+- summarization
+- translation
+license: apache-2.0
+inference: false
+---
+# Model Card for T5 11B
+![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)
+#  Table of Contents
+1. [Model Details](#model-details)
+2. [Uses](#uses)
+3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
+4. [Training Details](#training-details)
+5. [Evaluation](#evaluation)
+6. [Environmental Impact](#environmental-impact)
+7. [Citation](#citation)
+8. [Model Card Authors](#model-card-authors)
+9. [How To Get Started With the Model](#how-to-get-started-with-the-model)
+# Model Details
+## Model Description
+The developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html):
+> With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.
+T5-11B is the checkpoint with 11 billion parameters.
+- **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)
+- **Model type:** Language model
+- **Language(s) (NLP):** English, French, Romanian, German
+- **License:** Apache 2.0
+- **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)
+- **Resources for more information:**
+  - [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)
+  - [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
+  - [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)
+  - [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)
+# Uses
+## Direct Use and Downstream Use
+The developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model:
+> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.
+See the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.
+## Out-of-Scope Use
+More information needed.
+# Bias, Risks, and Limitations
+More information needed.
+## Recommendations
+More information needed.
+# Training Details
+## Training Data
+The model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.
+The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**.
+Thereby, the following datasets were being used for (1.) and (2.):
+1. **Datasets used for Unsupervised denoising objective**:
+- [C4](https://huggingface.co/datasets/c4)
+- [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)
+2. **Datasets used for Supervised text-to-text language modeling objective**
+- Sentence acceptability judgment
+  - CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)
+- Sentiment analysis
+  - SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)
+- Paraphrasing/sentence similarity
+  - MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)
+  - STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)
+  - QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)
+- Natural language inference
+  - MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)
+  - QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)
+  - RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9)
+  - CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)
+- Sentence completion
+  - COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)
+- Word sense disambiguation
+  - WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)
+- Question answering
+  - MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)
+  - ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)
+  - BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)
+## Training Procedure
+In their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write:
+> In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.
+The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.
+# Evaluation
+## Testing Data, Factors & Metrics
+The developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.
+## Results
+For full results for T5-11B, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.
+# Environmental Impact
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** Google Cloud TPU Pods
+- **Hours used:** More information needed
+- **Cloud Provider:** GCP
+- **Compute Region:** More information needed
+- **Carbon Emitted:** More information needed
+# Citation
+**BibTeX:**
+```bibtex
+@article{2020t5,
+  author  = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
+  title   = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
+  journal = {Journal of Machine Learning Research},
+  year    = {2020},
+  volume  = {21},
+  number  = {140},
+  pages   = {1-67},
+  url     = {http://jmlr.org/papers/v21/20-074.html}
+}
+```
+**APA:**
+- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.
+# Model Card Authors
+This model card was written by the team at Hugging Face.
+# How to Get Started with the Model
+## Disclaimer
+**Before `transformers` v3.5.0**, due do its immense size, `t5-11b` required some special treatment.
+If you're using transformers `<= v3.4.0`, `t5-11b` should be loaded with flag `use_cdn` set to `False` as follows:
+```python
+t5 = transformers.T5ForConditionalGeneration.from_pretrained('t5-11b', use_cdn = False)
+```
+Secondly, a single GPU will most likely not have enough memory to even load the model into memory as the weights alone amount to over 40 GB.
+- Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
+- DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).
+See the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more context.

config.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "architectures": [
+    "T5WithLMHeadModel"
+  ],
+  "d_ff": 65536,
+  "d_kv": 128,
+  "d_model": 1024,
+  "decoder_start_token_id": 0,
+  "dropout_rate": 0.1,
+  "eos_token_id": 1,
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "t5",
+  "n_positions": 512,
+  "num_heads": 128,
+  "num_layers": 24,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_num_buckets": 32,
+  "task_specific_params": {
+    "summarization": {
+      "early_stopping": true,
+      "length_penalty": 2.0,
+      "max_length": 200,
+      "min_length": 30,
+      "no_repeat_ngram_size": 3,
+      "num_beams": 4,
+      "prefix": "summarize: "
+    },
+    "translation_en_to_de": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to German: "
+    },
+    "translation_en_to_fr": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to French: "
+    },
+    "translation_en_to_ro": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to Romanian: "
+    }
+  },
+  "vocab_size": 32128
+}

pytorch_model.bin.index.json ADDED Viewed

	@@ -0,0 +1,517 @@

+{
+  "metadata": {
+    "total_size": 45229301760
+  },
+  "weight_map": {
+    "decoder.block.0.layer.0.SelfAttention.k.weight": "pytorch_model_00006-of-00015.bin",
+    "decoder.block.0.layer.0.SelfAttention.o.weight": "pytorch_model_00006-of-00015.bin",
+    "decoder.block.0.layer.0.SelfAttention.q.weight": "pytorch_model_00006-of-00015.bin",
+    "decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "pytorch_model_00006-of-00015.bin",
+    "decoder.block.0.layer.0.SelfAttention.v.weight": "pytorch_model_00006-of-00015.bin",
+    "decoder.block.0.layer.0.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
+    "decoder.block.0.layer.1.EncDecAttention.k.weight": "pytorch_model_00006-of-00015.bin",
+    "decoder.block.0.layer.1.EncDecAttention.o.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.0.layer.1.EncDecAttention.q.weight": "pytorch_model_00006-of-00015.bin",
+    "decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.0.layer.1.EncDecAttention.v.weight": "pytorch_model_00006-of-00015.bin",
+    "decoder.block.0.layer.1.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.0.layer.2.DenseReluDense.wi.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.0.layer.2.DenseReluDense.wo.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.0.layer.2.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.0.SelfAttention.k.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.0.SelfAttention.o.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.0.SelfAttention.q.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.0.SelfAttention.v.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.0.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.1.EncDecAttention.k.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.1.EncDecAttention.o.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.1.EncDecAttention.q.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.1.EncDecAttention.v.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.1.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.2.DenseReluDense.wi.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.2.DenseReluDense.wo.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.1.layer.2.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.10.layer.0.SelfAttention.k.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.0.SelfAttention.o.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.0.SelfAttention.q.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.0.SelfAttention.v.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.0.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.1.EncDecAttention.k.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.1.EncDecAttention.o.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.1.EncDecAttention.q.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.1.EncDecAttention.v.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.1.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.2.DenseReluDense.wi.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.2.DenseReluDense.wo.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.10.layer.2.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.11.layer.0.SelfAttention.k.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.0.SelfAttention.o.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.0.SelfAttention.q.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.0.SelfAttention.v.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.0.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.1.EncDecAttention.k.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.1.EncDecAttention.o.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.1.EncDecAttention.q.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.1.EncDecAttention.v.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.1.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.2.DenseReluDense.wi.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.2.DenseReluDense.wo.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.11.layer.2.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.0.SelfAttention.k.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.0.SelfAttention.o.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.0.SelfAttention.q.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.0.SelfAttention.v.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.0.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.1.EncDecAttention.k.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.1.EncDecAttention.o.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.1.EncDecAttention.q.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.1.EncDecAttention.v.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.1.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.2.DenseReluDense.wi.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.2.DenseReluDense.wo.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.12.layer.2.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.13.layer.0.SelfAttention.k.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.13.layer.0.SelfAttention.o.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.13.layer.0.SelfAttention.q.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.13.layer.0.SelfAttention.v.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.13.layer.0.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.13.layer.1.EncDecAttention.k.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.13.layer.1.EncDecAttention.o.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.13.layer.1.EncDecAttention.q.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.13.layer.1.EncDecAttention.v.weight": "pytorch_model_00011-of-00015.bin",
+    "decoder.block.13.layer.1.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.13.layer.2.DenseReluDense.wi.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.13.layer.2.DenseReluDense.wo.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.13.layer.2.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.0.SelfAttention.k.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.0.SelfAttention.o.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.0.SelfAttention.q.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.0.SelfAttention.v.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.0.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.1.EncDecAttention.k.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.1.EncDecAttention.o.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.1.EncDecAttention.q.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.1.EncDecAttention.v.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.1.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.2.DenseReluDense.wi.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.2.DenseReluDense.wo.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.14.layer.2.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.0.SelfAttention.k.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.0.SelfAttention.o.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.0.SelfAttention.q.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.0.SelfAttention.v.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.0.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.1.EncDecAttention.k.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.1.EncDecAttention.o.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.1.EncDecAttention.q.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.1.EncDecAttention.v.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.1.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.2.DenseReluDense.wi.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.2.DenseReluDense.wo.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.15.layer.2.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.16.layer.0.SelfAttention.k.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.16.layer.0.SelfAttention.o.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.16.layer.0.SelfAttention.q.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.16.layer.0.SelfAttention.v.weight": "pytorch_model_00012-of-00015.bin",
+    "decoder.block.16.layer.0.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.16.layer.1.EncDecAttention.k.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.16.layer.1.EncDecAttention.o.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.16.layer.1.EncDecAttention.q.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.16.layer.1.EncDecAttention.v.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.16.layer.1.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.16.layer.2.DenseReluDense.wi.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.16.layer.2.DenseReluDense.wo.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.16.layer.2.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.0.SelfAttention.k.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.0.SelfAttention.o.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.0.SelfAttention.q.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.0.SelfAttention.v.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.0.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.1.EncDecAttention.k.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.1.EncDecAttention.o.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.1.EncDecAttention.q.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.1.EncDecAttention.v.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.1.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.2.DenseReluDense.wi.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.2.DenseReluDense.wo.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.17.layer.2.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.0.SelfAttention.k.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.0.SelfAttention.o.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.0.SelfAttention.q.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.0.SelfAttention.v.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.0.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.1.EncDecAttention.k.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.1.EncDecAttention.o.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.1.EncDecAttention.q.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.1.EncDecAttention.v.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.1.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.2.DenseReluDense.wi.weight": "pytorch_model_00013-of-00015.bin",
+    "decoder.block.18.layer.2.DenseReluDense.wo.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.18.layer.2.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.0.SelfAttention.k.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.0.SelfAttention.o.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.0.SelfAttention.q.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.0.SelfAttention.v.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.0.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.1.EncDecAttention.k.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.1.EncDecAttention.o.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.1.EncDecAttention.q.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.1.EncDecAttention.v.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.1.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.2.DenseReluDense.wi.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.2.DenseReluDense.wo.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.19.layer.2.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.2.layer.0.SelfAttention.k.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.0.SelfAttention.o.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.0.SelfAttention.q.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.0.SelfAttention.v.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.0.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.1.EncDecAttention.k.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.1.EncDecAttention.o.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.1.EncDecAttention.q.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.1.EncDecAttention.v.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.1.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.2.DenseReluDense.wi.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.2.DenseReluDense.wo.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.2.layer.2.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.20.layer.0.SelfAttention.k.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.0.SelfAttention.o.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.0.SelfAttention.q.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.0.SelfAttention.v.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.0.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.1.EncDecAttention.k.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.1.EncDecAttention.o.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.1.EncDecAttention.q.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.1.EncDecAttention.v.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.1.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.2.DenseReluDense.wi.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.2.DenseReluDense.wo.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.20.layer.2.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.21.layer.0.SelfAttention.k.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.21.layer.0.SelfAttention.o.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.21.layer.0.SelfAttention.q.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.21.layer.0.SelfAttention.v.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.21.layer.0.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.21.layer.1.EncDecAttention.k.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.21.layer.1.EncDecAttention.o.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.21.layer.1.EncDecAttention.q.weight": "pytorch_model_00014-of-00015.bin",
+    "decoder.block.21.layer.1.EncDecAttention.v.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.21.layer.1.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.21.layer.2.DenseReluDense.wi.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.21.layer.2.DenseReluDense.wo.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.21.layer.2.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.0.SelfAttention.k.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.0.SelfAttention.o.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.0.SelfAttention.q.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.0.SelfAttention.v.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.0.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.1.EncDecAttention.k.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.1.EncDecAttention.o.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.1.EncDecAttention.q.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.1.EncDecAttention.v.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.1.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.2.DenseReluDense.wi.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.2.DenseReluDense.wo.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.22.layer.2.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.0.SelfAttention.k.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.0.SelfAttention.o.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.0.SelfAttention.q.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.0.SelfAttention.v.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.0.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.1.EncDecAttention.k.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.1.EncDecAttention.o.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.1.EncDecAttention.q.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.1.EncDecAttention.v.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.1.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.2.DenseReluDense.wi.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.2.DenseReluDense.wo.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.23.layer.2.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
+    "decoder.block.3.layer.0.SelfAttention.k.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.3.layer.0.SelfAttention.o.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.3.layer.0.SelfAttention.q.weight": "pytorch_model_00007-of-00015.bin",
+    "decoder.block.3.layer.0.SelfAttention.v.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.3.layer.0.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.3.layer.1.EncDecAttention.k.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.3.layer.1.EncDecAttention.o.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.3.layer.1.EncDecAttention.q.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.3.layer.1.EncDecAttention.v.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.3.layer.1.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.3.layer.2.DenseReluDense.wi.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.3.layer.2.DenseReluDense.wo.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.3.layer.2.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.0.SelfAttention.k.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.0.SelfAttention.o.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.0.SelfAttention.q.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.0.SelfAttention.v.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.0.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.1.EncDecAttention.k.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.1.EncDecAttention.o.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.1.EncDecAttention.q.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.1.EncDecAttention.v.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.1.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.2.DenseReluDense.wi.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.2.DenseReluDense.wo.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.4.layer.2.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.0.SelfAttention.k.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.0.SelfAttention.o.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.0.SelfAttention.q.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.0.SelfAttention.v.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.0.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.1.EncDecAttention.k.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.1.EncDecAttention.o.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.1.EncDecAttention.q.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.1.EncDecAttention.v.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.1.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
+    "decoder.block.5.layer.2.DenseReluDense.wi.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.5.layer.2.DenseReluDense.wo.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.5.layer.2.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.0.SelfAttention.k.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.0.SelfAttention.o.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.0.SelfAttention.q.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.0.SelfAttention.v.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.0.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.1.EncDecAttention.k.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.1.EncDecAttention.o.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.1.EncDecAttention.q.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.1.EncDecAttention.v.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.1.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.2.DenseReluDense.wi.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.2.DenseReluDense.wo.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.6.layer.2.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.0.SelfAttention.k.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.0.SelfAttention.o.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.0.SelfAttention.q.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.0.SelfAttention.v.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.0.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.1.EncDecAttention.k.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.1.EncDecAttention.o.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.1.EncDecAttention.q.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.1.EncDecAttention.v.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.1.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.2.DenseReluDense.wi.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.2.DenseReluDense.wo.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.7.layer.2.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.8.layer.0.SelfAttention.k.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.8.layer.0.SelfAttention.o.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.8.layer.0.SelfAttention.q.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.8.layer.0.SelfAttention.v.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.8.layer.0.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
+    "decoder.block.8.layer.1.EncDecAttention.k.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.8.layer.1.EncDecAttention.o.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.8.layer.1.EncDecAttention.q.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.8.layer.1.EncDecAttention.v.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.8.layer.1.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.8.layer.2.DenseReluDense.wi.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.8.layer.2.DenseReluDense.wo.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.8.layer.2.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.0.SelfAttention.k.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.0.SelfAttention.o.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.0.SelfAttention.q.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.0.SelfAttention.v.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.0.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.1.EncDecAttention.k.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.1.EncDecAttention.o.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.1.EncDecAttention.q.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.1.EncDecAttention.v.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.1.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.2.DenseReluDense.wi.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.2.DenseReluDense.wo.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.block.9.layer.2.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
+    "decoder.final_layer_norm.weight": "pytorch_model_00015-of-00015.bin",
+    "encoder.block.0.layer.0.SelfAttention.k.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.0.layer.0.SelfAttention.o.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.0.layer.0.SelfAttention.q.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.0.layer.0.SelfAttention.v.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.0.layer.0.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.0.layer.1.DenseReluDense.wi.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.0.layer.1.DenseReluDense.wo.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.0.layer.1.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.1.layer.0.SelfAttention.k.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.1.layer.0.SelfAttention.o.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.1.layer.0.SelfAttention.q.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.1.layer.0.SelfAttention.v.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.1.layer.0.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.1.layer.1.DenseReluDense.wi.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.1.layer.1.DenseReluDense.wo.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.1.layer.1.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.10.layer.0.SelfAttention.k.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.10.layer.0.SelfAttention.o.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.10.layer.0.SelfAttention.q.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.10.layer.0.SelfAttention.v.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.10.layer.0.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.10.layer.1.DenseReluDense.wi.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.10.layer.1.DenseReluDense.wo.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.10.layer.1.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.11.layer.0.SelfAttention.k.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.11.layer.0.SelfAttention.o.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.11.layer.0.SelfAttention.q.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.11.layer.0.SelfAttention.v.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.11.layer.0.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.11.layer.1.DenseReluDense.wi.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.11.layer.1.DenseReluDense.wo.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.11.layer.1.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.12.layer.0.SelfAttention.k.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.12.layer.0.SelfAttention.o.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.12.layer.0.SelfAttention.q.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.12.layer.0.SelfAttention.v.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.12.layer.0.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.12.layer.1.DenseReluDense.wi.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.12.layer.1.DenseReluDense.wo.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.12.layer.1.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.13.layer.0.SelfAttention.k.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.13.layer.0.SelfAttention.o.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.13.layer.0.SelfAttention.q.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.13.layer.0.SelfAttention.v.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.13.layer.0.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.13.layer.1.DenseReluDense.wi.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.13.layer.1.DenseReluDense.wo.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.13.layer.1.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.14.layer.0.SelfAttention.k.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.14.layer.0.SelfAttention.o.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.14.layer.0.SelfAttention.q.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.14.layer.0.SelfAttention.v.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.14.layer.0.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.14.layer.1.DenseReluDense.wi.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.14.layer.1.DenseReluDense.wo.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.14.layer.1.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.15.layer.0.SelfAttention.k.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.15.layer.0.SelfAttention.o.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.15.layer.0.SelfAttention.q.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.15.layer.0.SelfAttention.v.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.15.layer.0.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.15.layer.1.DenseReluDense.wi.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.15.layer.1.DenseReluDense.wo.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.15.layer.1.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.16.layer.0.SelfAttention.k.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.16.layer.0.SelfAttention.o.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.16.layer.0.SelfAttention.q.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.16.layer.0.SelfAttention.v.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.16.layer.0.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.16.layer.1.DenseReluDense.wi.weight": "pytorch_model_00004-of-00015.bin",
+    "encoder.block.16.layer.1.DenseReluDense.wo.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.16.layer.1.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.17.layer.0.SelfAttention.k.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.17.layer.0.SelfAttention.o.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.17.layer.0.SelfAttention.q.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.17.layer.0.SelfAttention.v.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.17.layer.0.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.17.layer.1.DenseReluDense.wi.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.17.layer.1.DenseReluDense.wo.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.17.layer.1.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.18.layer.0.SelfAttention.k.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.18.layer.0.SelfAttention.o.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.18.layer.0.SelfAttention.q.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.18.layer.0.SelfAttention.v.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.18.layer.0.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.18.layer.1.DenseReluDense.wi.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.18.layer.1.DenseReluDense.wo.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.18.layer.1.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.19.layer.0.SelfAttention.k.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.19.layer.0.SelfAttention.o.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.19.layer.0.SelfAttention.q.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.19.layer.0.SelfAttention.v.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.19.layer.0.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.19.layer.1.DenseReluDense.wi.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.19.layer.1.DenseReluDense.wo.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.19.layer.1.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.2.layer.0.SelfAttention.k.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.2.layer.0.SelfAttention.o.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.2.layer.0.SelfAttention.q.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.2.layer.0.SelfAttention.v.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.2.layer.0.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.2.layer.1.DenseReluDense.wi.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.2.layer.1.DenseReluDense.wo.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.2.layer.1.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.20.layer.0.SelfAttention.k.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.20.layer.0.SelfAttention.o.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.20.layer.0.SelfAttention.q.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.20.layer.0.SelfAttention.v.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.20.layer.0.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.20.layer.1.DenseReluDense.wi.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.20.layer.1.DenseReluDense.wo.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.20.layer.1.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
+    "encoder.block.21.layer.0.SelfAttention.k.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.21.layer.0.SelfAttention.o.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.21.layer.0.SelfAttention.q.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.21.layer.0.SelfAttention.v.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.21.layer.0.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.21.layer.1.DenseReluDense.wi.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.21.layer.1.DenseReluDense.wo.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.21.layer.1.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.22.layer.0.SelfAttention.k.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.22.layer.0.SelfAttention.o.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.22.layer.0.SelfAttention.q.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.22.layer.0.SelfAttention.v.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.22.layer.0.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.22.layer.1.DenseReluDense.wi.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.22.layer.1.DenseReluDense.wo.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.22.layer.1.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.23.layer.0.SelfAttention.k.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.23.layer.0.SelfAttention.o.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.23.layer.0.SelfAttention.q.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.23.layer.0.SelfAttention.v.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.23.layer.0.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.23.layer.1.DenseReluDense.wi.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.23.layer.1.DenseReluDense.wo.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.23.layer.1.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
+    "encoder.block.3.layer.0.SelfAttention.k.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.3.layer.0.SelfAttention.o.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.3.layer.0.SelfAttention.q.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.3.layer.0.SelfAttention.v.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.3.layer.0.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.3.layer.1.DenseReluDense.wi.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.3.layer.1.DenseReluDense.wo.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.3.layer.1.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
+    "encoder.block.4.layer.0.SelfAttention.k.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.4.layer.0.SelfAttention.o.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.4.layer.0.SelfAttention.q.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.4.layer.0.SelfAttention.v.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.4.layer.0.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.4.layer.1.DenseReluDense.wi.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.4.layer.1.DenseReluDense.wo.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.4.layer.1.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.5.layer.0.SelfAttention.k.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.5.layer.0.SelfAttention.o.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.5.layer.0.SelfAttention.q.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.5.layer.0.SelfAttention.v.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.5.layer.0.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.5.layer.1.DenseReluDense.wi.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.5.layer.1.DenseReluDense.wo.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.5.layer.1.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.6.layer.0.SelfAttention.k.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.6.layer.0.SelfAttention.o.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.6.layer.0.SelfAttention.q.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.6.layer.0.SelfAttention.v.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.6.layer.0.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.6.layer.1.DenseReluDense.wi.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.6.layer.1.DenseReluDense.wo.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.6.layer.1.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.7.layer.0.SelfAttention.k.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.7.layer.0.SelfAttention.o.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.7.layer.0.SelfAttention.q.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.7.layer.0.SelfAttention.v.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.7.layer.0.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.7.layer.1.DenseReluDense.wi.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.7.layer.1.DenseReluDense.wo.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.7.layer.1.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.8.layer.0.SelfAttention.k.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.8.layer.0.SelfAttention.o.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.8.layer.0.SelfAttention.q.weight": "pytorch_model_00002-of-00015.bin",
+    "encoder.block.8.layer.0.SelfAttention.v.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.8.layer.0.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.8.layer.1.DenseReluDense.wi.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.8.layer.1.DenseReluDense.wo.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.8.layer.1.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.9.layer.0.SelfAttention.k.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.9.layer.0.SelfAttention.o.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.9.layer.0.SelfAttention.q.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.9.layer.0.SelfAttention.v.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.9.layer.0.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.9.layer.1.DenseReluDense.wi.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.9.layer.1.DenseReluDense.wo.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.block.9.layer.1.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
+    "encoder.final_layer_norm.weight": "pytorch_model_00006-of-00015.bin",
+    "shared.weight": "pytorch_model_00001-of-00015.bin"
+  }
+}

pytorch_model_00001-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:933b8cd428298f520558c2f1f6572762f45a1215feaee9e0a17d4980c6c4b419
+size 1676445631

pytorch_model_00002-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c66455cbe43c2bae0010584285eb3d39b9e2c24ed8d4ade02a992273c544d77
+size 1677748159

pytorch_model_00003-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f45e56b051d63a3a321e3fbbf7134ed70e3d84cf445aa203569e1cb4f6e74b4
+size 1677748159

pytorch_model_00004-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a3eeac9612fa4860a5edfee26099a00cf0172297f973698cf95a000cefd1bc3c
+size 1744859071

pytorch_model_00005-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff9cc2595345de164df497989f77220b1ef8a1e7e2c4af260bf936409e03bc22
+size 1744859071

pytorch_model_00006-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9aa56ffe8ef5e5f04f90de24b5c4af2bab966623ab63b498eccb07e07d7f56ab
+size 1442875327

pytorch_model_00007-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8685e462c245c4fdaad67d09be46dfdf486c594684330b1f4bc2e5e274e27c53
+size 1442875327

pytorch_model_00008-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:845669bd0bbb1c6453e01a960cd823581aec71734f182637fab450637abdd7f4
+size 1275094975

pytorch_model_00009-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47dea77791ccb36e411f36e76b4b47878fe03f9ac98de67ef81a1e07d295b3d0
+size 1476421567

pytorch_model_00010-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec25daf44e8fd600a2e1a23e3e11d9d1b7f7ecf671e65e198e796612e8268520
+size 1476421567

pytorch_model_00011-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18fadba0d3ca13ef19dc343088366926b038e93a912752070f03407736c97b4d
+size 1308647423

pytorch_model_00012-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f61d3ccfe0e37cf5abee04e40fb54defb15bcd1fc8d99390e37d52af909ed7e
+size 1476421631

pytorch_model_00013-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e87f0d5998d9cc309fcda325084b32975a489e403bea04278488e4746d14424
+size 1375758335

pytorch_model_00014-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f35e54a4f47e92426c745011a0e75d3913e588a37282af1e113f5a7affc835b
+size 1375758335

pytorch_model_00015-of-00015.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1201c712889c87135cdae9701284baff3b502d52edf600ef65b513b0cde1613a
+size 1442869183

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
+size 791656

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff