Text Generation
Transformers
PyTorch
English
llama
text-generation-inference
Inference Endpoints
LoneStriker commited on
Commit
4b63098
1 Parent(s): 56594fe

Upload folder using huggingface_hub

Browse files
LICENSE.md ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AI2 ImpACT License – Low Risk Artifacts
2
+ ==========================================
3
+
4
+ ### Version 1.0
5
+
6
+ This AI2 ImpACT License for Low Risk Artifacts (“LR Agreement”) is between The Allen Institute for Artificial Intelligence (“AI2”) and any individual or entity who purchases, downloads, installs, logs into, accesses or otherwise uses any ”low risk” Artifacts that refer to this LR Agreement, including any other person or entity that an individual purports to represent, be the agent of, or otherwise act on behalf of (collectively, “You”).
7
+
8
+ By clicking or taking similar action to accept this LR Agreement, or by accessing or using any Artifacts licensed under this LR Agreement, You agree to all the terms and conditions herein. You represent and warrant that You are at least 18 years old and have the full legal right and authority to enter into this LR Agreement and bind any employer or entity that You are acting on behalf of. If You do not agree or have the requisite authority, You have no right to access or use any Artifacts and must immediately cease any existing use.
9
+
10
+ A human-friendly summary of the legal text can be found [here](http://allenai.org/impact-license).
11
+
12
+ 1. **DEFINITIONS**.
13
+
14
+ 1. “**Artifact(s)**” means collectively and individually, anything used to build or apply a Model that is licensed by AI2 pursuant to this LR Agreement, such as weights or Data and any Derivatives of the original Artifact.
15
+
16
+ 2. “**Code**” means a text listing of commands to be compiled or assembled into an executable computer program.
17
+
18
+ 3. “**Data**” means the datasets created and/or compiled by AI2 to pretrain, train or fine-tune a Model.
19
+
20
+ 4. “**Data Derivatives**” means
21
+
22
+ 1. all modifications of the Data, and/or
23
+
24
+ 2. all derivative works created from the Data that are considered copyrighted works under U.S. copyright laws.
25
+
26
+ 5. “**Derivatives**” means anything that is based on or derived from any Artifact within the meaning of applicable U.S. copyright laws; including specifically and without limitation Model Derivatives and Data Derivatives
27
+
28
+ 6. “**Distribute**” or “**Distribution**” means any transmission, reproduction, publication, public display, or other sharing of the Artifacts to a Third party by any means, including as a hosted service made available by electronic or other remote means - e.g. API-based or web access.
29
+
30
+ 7. “**Model**” means the algorithm, weights and/or parameters used to produce the desired outcome, whether a machine learning algorithm or a deeper neural network.
31
+
32
+ 8. “**Model Derivatives**” means
33
+
34
+ 1. all modifications to the Model; and/or
35
+
36
+ 2. any other model which is created or initialized by transfer of patterns of the weights, parameters, activations or output of the Model, to the other model, in order to cause the other model to perform similarly to the Model, including - but not limited to - distillation methods entailing the use of intermediate data representations or methods based on the generation of synthetic data by the Model for training the other model.
37
+
38
+ 9. “**Term**” means the period of time starting from the date You access or use any Artifacts until this LR Agreement is terminated in accordance with **Section 4**.
39
+
40
+ 10. “**Third Party**” means any party other than You or AI2.
41
+
42
+ 11. “**Use-Based Restrictions**” means the specified restricted use cases set forth in **Exhibit A**.
43
+
44
+ 2. **LICENSE**. Subject to Your compliance with the requirements in this LR Agreement together with all applicable laws, AI2 grants to You a worldwide, non-exclusive, non-transferable, royalty-free license to use, install, and create Derivatives strictly in accordance with the requirements and restrictions as set forth below.
45
+
46
+ 1. **Distribution**. You may Distribute any Artifacts and Your Derivatives, provided that:
47
+
48
+ 1. You flow down and include the Use-Based Restrictions as an enforceable provision in any type of license or legal agreement governing downstream use and/or Distribution;
49
+
50
+ 2. You cause any of Your Derivatives to carry a prominent notice stating that You changed the original Artifact and how the Artifact was modified; and
51
+
52
+ 3. You retain all applicable copyright, patent, trademark, and attribution notices included with the Artifact.
53
+
54
+ 2. **Attribution**. Together with any copies of the Artifacts or Derivatives that You Distribute, You must provide (i) a copy of this LR Agreement; and (ii) the following attribution notice: _“\[Artifact\] is licensed under the AI2 ImpACT License for Medium Risk Artifacts, © \[year\] The Allen Institute for Artificial Intelligence.”_
55
+
56
+ 3. **Derivative Impact Reports**. AI2 seeks to encourage transparency around Derivatives through the use of Derivative Impact Reports, available [here](http://allenai.org/impact-license). Before releasing a Model Derivative or Data Derivative, You will complete a Derivative Impact Report and will publish, post, or make available the results of the Derivative Impact Report to the general public without imposing any direct or indirect restrictions, conditions or barriers to access, such as a paywall, fee, subscription, account, or requirement to submit personal information. You agree that AI2 may publish, post, or make available the information in Your Derivative Impact Report for review by the general public.
57
+
58
+ 1. You agree to maintain the transparency and accuracy of information regarding Your Derivatives in good faith and will update the Derivative Impact Report whenever a material change has occurred in any of the reporting categories therein.
59
+
60
+ 2. You acknowledge that Derivative Impact Reports are not intended to penalize any good faith disclosures about Derivatives. Accordingly, if You initiate or participate in any lawsuit or other legal action against a Third Party based on information in such Third Party’s Derivative Impact Report, then this LR Agreement will terminate immediately as of the date such lawsuit or legal action is filed or commenced.
61
+
62
+ 4. **Use-Based Restrictions**. You will not use any Artifacts or Derivatives in connection with any Use-Based Restrictions, including without limitation, creating any content with, finetuning, updating, running, training, evaluating and/or reparametrizing a Model.
63
+
64
+ 5. **No Circumvention**. You acknowledge that the purpose of the license granted herein is to facilitate transparency and responsible development of AI technology. Accordingly, You will not directly or indirectly circumvent the requirements in this **Section 2**, nor assist or enable any Third Party to do so.
65
+
66
+ 6. **Revocable License**. The license granted to You is revocable. To the maximum extent permitted by law, AI2 reserves the right to suspend, restrict, or terminate (remotely or otherwise) Your access, use or Distribution of any Artifacts and/or Derivatives not expressly permitted herein.
67
+
68
+ 3. **INTELLECTUAL PROPERTY RIGHTS.**
69
+
70
+ 1. AI2 and its licensors retain all right, title and interest in and to the Artifacts, including all patent, copyright, trademark, and trade secret rights, whether such rights are registered or unregistered, and wherever in the world those rights may exist. You will not commit any act or omission that contradicts or is inconsistent with AI2’s rights, nor permit or induce any Third Party to do the same. Other than the license granted in **Section 2** and as provided in **Section 3(b)**, all rights are expressly reserved by AI2.
71
+
72
+ 2. Subject to Your compliance with this LR Agreement, You will own any Derivatives You create. However, if Your use or Distribution of any Derivative is in breach of this LR Agreement, You will transfer and assign all right, title, and interest in and to such Derivative to AI2 and execute any related documentation as required by AI2.
73
+
74
+ 4. **TERM AND TERMINATION**. AI2 may terminate this LR Agreement by written notice at any time if You materially breach any of Your obligations herein and fail to cure to AI2’s satisfaction within thirty (30) days after such notice.
75
+
76
+ 1. Upon any termination of this LR Agreement, the license granted in **Section 2** will automatically terminate as of the termination date and You will:
77
+
78
+ 1. Cease all use of the Artifacts and immediately delete all copies in Your possession or control; and
79
+
80
+ 2. Cease all use and Distribution of any Derivatives and promptly provide AI2 with any other information regarding Your Derivatives as requested by AI2, including any documentation to assign Your Derivatives to AI2 pursuant to **Section 3(b)**.
81
+
82
+ 2. Additionally, if AI2 terminates this LR Agreement due to Your breach of **Section 2(d)** (Use-Based Restrictions), You authorize AI2 to post a prominent notice stating that You violated the Use-Based Restrictions of this LR Agreement and that Your rights to use the Artifacts and Derivatives were terminated by AI2.
83
+
84
+ 3. All terms and provisions that are reasonably interpreted to survive termination of this LR Agreement to fulfill its essential purpose will survive, including **Sections 3-8**.
85
+
86
+ 5. **DISCLAIMER**. AI2 PROVIDES THE ARTIFACTS ON AN “AS-IS” BASIS, AND AI2 DISCLAIMS ALL EXPRESS AND IMPLIED WARRANTIES OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING AN ARTIFACT, OR CREATING OR DISTRIBUTING ANY DERIVATIVES, AND YOU ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR EXERCISE OF PERMISSIONS UNDER THIS LR AGREEMENT.
87
+
88
+ 6. **LIMITATION OF LIABILITY**. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAWS, IN NO EVENT WILL AI2 BE LIABLE TO YOU OR ANY THIRD PARTY FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER (INCLUDING BUT NOT LIMITED TO DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE, INTEROPERABILITY OR MALFUNCTION, OR ANY OTHER LOSS) ARISING FROM OR RELATED TO THIS LR AGREEMENT, INCLUDING WITHOUT LIMITATION, ANY USE OR INABILITY TO USE ANY ARTIFACTS , WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR ANY OTHER LEGAL THEORY, EVEN IF YOU OR ANY THIRD PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
89
+
90
+ 7. **INDEMNIFICATION FOR THIRD-PARTY CLAIMS**. You will defend and indemnify AI2 and its officers, directors, employees, and agents from and against any and all Third Party claims, lawsuits, and proceedings that arise or result from:
91
+
92
+ 1. Your material, uncured breach of this LR Agreement; and/or
93
+
94
+ 2. Your failure or alleged failure to comply with applicable laws or any violation of a Third Party’s rights in connection with Your use of the Artifacts or Your Derivatives.
95
+
96
+ 8. **MISCELLANEOUS**.
97
+
98
+ 1. **Consideration**. The Artifacts are provided to You by AI2 subject to Your continued compliance with the terms and conditions of this LR Agreement.
99
+
100
+ 2. **Relationship**. This LR Agreement and the parties’ relationship hereunder is non-exclusive, and neither party is restricted or limited in any way from entering into the same or similar arrangements with Third Parties. Nothing in this LR Agreement will be deemed or construed to create any employment, franchise, joint venture, partnership, agency or other such similar relationship between You and AI2.
101
+
102
+ 3. **No Waiver; Equitable Remedies**. Any delay or failure of AI2 to enforce its rights or any provision of this LR Agreement will not be a waiver unless specifically issued in writing by AI2. Any term that is held to be invalid or enforceable will not affect any other terms in this LR Agreement, which will remain in full force and effect. You acknowledge that if You breach this LR Agreement, it may cause irreparable harm to AI2, and You agree that AI2 may seek injunctive relief against You in addition to any other legal and equitable remedies.
103
+
104
+ 4. **Export Control**: You will not violate any applicable U.S. and non-U.S. export control and trade sanctions laws (“Export Laws”) or directly or indirectly export, re-export, provide, or otherwise transfer any Artifacts or Derivatives
105
+
106
+ 1. to any individual, entity, or country prohibited by Export Laws;
107
+
108
+ 2. to anyone on U.S. or non-U.S. government restricted parties lists; or
109
+
110
+ 3. for any purpose prohibited by Export Laws, including nuclear, chemical or biological weapons, or missile technology applications.
111
+
112
+ 5. **Governing Law**. This LR Agreement will be governed by the laws of the State of Washington, U.S.A. without regard to its choice of laws or conflict of laws rules.
113
+
114
+ 6. **Entire Agreement**. Except as otherwise specifically set forth herein, this LR Agreement and any documents or policies that are incorporated or made part of this LR Agreement by reference contain the entire agreement between You and AI2 regarding the subject matter herein.
115
+
116
+ 7. **Modifications**. AI2 may revise and update the terms of this LR Agreement from time to time and will post such updates to its website at [http://allenai.org/impact-license](http://allenai.org/impact-license). UNLESS OTHERWISE STATED IN THE AMENDED VERSION OF THIS LR AGREEMENT, ANY CHANGES TO THIS LR AGREEMENT WILL APPLY IMMEDIATELY UPON POSTING. While AI2 is not obligated to provide You with notice of any changes, any amendments to this LR Agreement will not apply retroactively to events that occurred prior to such changes. Your continued use or Distribution of the Artifact(s) and/or Your Derivatives will constitute Your agreement to the terms of the updated LR Agreement.
117
+
118
+ For any questions regarding this LR Agreement, please contact [ai2impact@allenai.org](mailto:ai2impact@allenai.org).
119
+
120
+ **EXHIBIT A**
121
+
122
+ **USE-BASED RESTRICTIONS**
123
+
124
+ 1. **EXPECTATIONS**. AI2 expects that You will not use, or cause or assist others to use, any Artifacts or Derivatives in connection with any academic dishonesty, including submitting any informational content or output of a Model as Your own work in any academic setting.
125
+
126
+ 2. **RESTRICTIONS**. You will not, and will not permit, assist, or cause any Third Party to use, modify, copy, reproduce, incorporate, create Derivatives of, or Distribute any Artifacts or Your Derivatives, in whole or in part, for:
127
+
128
+ 1. military weapons purposes or in the service of nuclear proliferation or nuclear weapons technology;
129
+
130
+ 2. purposes of military surveillance, including any research or development relating to military surveillance;
131
+
132
+ 3. purposes of generating or disseminating information or content, in any context (e.g. posts, articles, tweets, chatbots or other kinds of automated bots) without expressly and intelligibly disclaiming that the text is machine generated;
133
+
134
+ 4. purposes of ‘real time’ remote biometric processing or identification systems in publicly accessible spaces for the purpose of law enforcement;
135
+
136
+ 5. fully automated decision-making without a human in the loop; and/or
137
+
138
+ 6. purposes of the predictive administration of justice, law enforcement, immigration, or asylum processes, such as predicting an individual will commit fraud/crime (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ model-index:
3
+ - name: tulu-2-dpo-70b
4
+ results: []
5
+ datasets:
6
+ - HuggingFaceH4/ultrafeedback_binarized
7
+ - allenai/tulu-v2-sft-mixture
8
+ language:
9
+ - en
10
+ base_model: meta-llama/Llama-2-70b-hf
11
+ ---
12
+
13
+
14
+ <img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-v2/Tulu%20V2%20banner.png" alt="TuluV2 banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
15
+
16
+
17
+ # Model Card for Tulu V2 DPO 70B
18
+
19
+ Tulu is a series of language models that are trained to act as helpful assistants.
20
+ Tulu V2 DPO 70B is a fine-tuned version of Llama 2 that was trained on on a mix of publicly available, synthetic and human datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290).
21
+ This model is a strong alternative to Llama 2 70b Chat.
22
+
23
+ For more details, read the paper: [Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
24
+ ](https://arxiv.org/abs/2311.10702).
25
+
26
+
27
+ ## Model description
28
+
29
+ - **Model type:** The flagship model of a suite of instruction and RLHF tuned chat models on a mix of publicly available, synthetic and human-created datasets.
30
+ - **Language(s) (NLP):** Primarily English
31
+ - **License:** [AI2 ImpACT](https://allenai.org/impact-license) Low-risk license.
32
+ - **Finetuned from model:** [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)
33
+
34
+ ### Model Sources
35
+
36
+ - **Repository:** https://github.com/allenai/open-instruct
37
+ - **DPO Recipe:** The DPO recipe is from the [Zephyr Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) model
38
+ - **Model Family:** Other models and the dataset are found in the [Tulu V2 collection](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
39
+
40
+ ## Performance
41
+
42
+ | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
43
+ |-------------|-----|----|---------------|--------------|
44
+ | **Tulu-v2-7b** 🐪 | **7B** | **SFT** | **6.30** | **73.9** |
45
+ | **Tulu-v2-dpo-7b** 🐪 | **7B** | **DPO** | **6.29** | **85.1** |
46
+ | **Tulu-v2-13b** 🐪 | **13B** | **SFT** | **6.70** | **78.9** |
47
+ | **Tulu-v2-dpo-13b** 🐪 | **13B** | **DPO** | **7.00** | **89.5** |
48
+ | **Tulu-v2-70b** 🐪 | **70B** | **SFT** | **7.49** | **86.6** |
49
+ | **Tulu-v2-dpo-70b** 🐪 | **70B** | **DPO** | **7.89** | **95.1** |
50
+
51
+ ## Input Format
52
+
53
+ The model is trained to use the following format (note the newlines):
54
+ ```
55
+ <|user|>
56
+ Your message here!
57
+ <|assistant|>
58
+ ```
59
+
60
+ For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**
61
+
62
+
63
+ ## Intended uses & limitations
64
+
65
+ The model was initially fine-tuned on a filtered and preprocessed of the [Tulu V2 mix dataset](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture), which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.
66
+ We then further aligned the model with a [Jax DPO trainer](https://github.com/hamishivi/EasyLM/blob/main/EasyLM/models/llama/llama_train_dpo.py) built on [EasyLM](https://github.com/young-geng/EasyLM) on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contains 64k prompts and model completions that are ranked by GPT-4.
67
+
68
+
69
+ <!-- You can find the datasets used for training Tulu V2 [here]()
70
+
71
+ Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
72
+
73
+ ```python
74
+ # Install transformers from source - only needed for versions <= v4.34
75
+ # pip install git+https://github.com/huggingface/transformers.git
76
+ # pip install accelerate
77
+
78
+ import torch
79
+ from transformers import pipeline
80
+
81
+ pipe = pipeline("text-generation", model="HuggingFaceH4/tulu-2-dpo-70b", torch_dtype=torch.bfloat16, device_map="auto")
82
+
83
+ # We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
84
+ messages = [
85
+ {
86
+ "role": "system",
87
+ "content": "You are a friendly chatbot who always responds in the style of a pirate",
88
+ },
89
+ {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
90
+ ]
91
+ prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
92
+ outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
93
+ print(outputs[0]["generated_text"])
94
+ # <|system|>
95
+ # You are a friendly chatbot who always responds in the style of a pirate.</s>
96
+ # <|user|>
97
+ # How many helicopters can a human eat in one sitting?</s>
98
+ # <|assistant|>
99
+ # Ah, me hearty matey! But yer question be a puzzler! A human cannot eat a helicopter in one sitting, as helicopters are not edible. They be made of metal, plastic, and other materials, not food!
100
+ ```-->
101
+
102
+ ## Bias, Risks, and Limitations
103
+
104
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
105
+
106
+ The Tulu models have not been aligned to generate safe completions within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
107
+ It is also unknown what the size and composition of the corpus was used to train the base Llama 2 models, however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
108
+
109
+
110
+ ### Training hyperparameters
111
+
112
+ The following hyperparameters were used during DPO training:
113
+ - learning_rate: 5e-07
114
+ - total_train_batch_size: 32
115
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
116
+ - lr_scheduler_type: linear
117
+ - lr_scheduler_warmup_ratio: 0.1
118
+ - num_epochs: 3.0
119
+
120
+
121
+ ## Citation
122
+
123
+ If you find Tulu 2 is useful in your work, please cite it with:
124
+
125
+ ```
126
+ @misc{ivison2023camels,
127
+ title={Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2},
128
+ author={Hamish Ivison and Yizhong Wang and Valentina Pyatkin and Nathan Lambert and Matthew Peters and Pradeep Dasigi and Joel Jang and David Wadden and Noah A. Smith and Iz Beltagy and Hannaneh Hajishirzi},
129
+ year={2023},
130
+ eprint={2311.10702},
131
+ archivePrefix={arXiv},
132
+ primaryClass={cs.CL}
133
+ }
134
+ ```
135
+
136
+ *Model card adapted from [Zephyr Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/README.md)*
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "bos_token_id": 1,
6
+ "eos_token_id": 2,
7
+ "hidden_act": "silu",
8
+ "hidden_size": 8192,
9
+ "initializer_range": 0.02,
10
+ "intermediate_size": 28672,
11
+ "max_position_embeddings": 8192,
12
+ "model_type": "llama",
13
+ "num_attention_heads": 64,
14
+ "num_hidden_layers": 80,
15
+ "num_key_value_heads": 8,
16
+ "pretraining_tp": 1,
17
+ "rms_norm_eps": 1e-05,
18
+ "rope_scaling": null,
19
+ "rope_theta": 10000.0,
20
+ "tie_word_embeddings": false,
21
+ "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.33.2",
23
+ "use_cache": true,
24
+ "vocab_size": 32000
25
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.33.2"
6
+ }
huggingface-metadata.txt ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ url: https://huggingface.co/allenai/tulu-2-dpo-70b
2
+ branch: main
3
+ download date: 2023-11-23 09:56:36
4
+ sha256sum:
5
+ ef61332cc45d00e0ee90dc55cf4313ac990cc0f35ca9b10b4b8c6a3cde1ba881 pytorch_model-00001-of-00015.bin
6
+ be0b80c5a3e3f130564bda815da6f92e241fddcd1b4a244b8a8e3aa2efab4a76 pytorch_model-00002-of-00015.bin
7
+ 1b9031ce6e0840ff1c0d4b85bdc21b64288c8fdec1d0a067b61d56c4b8656c83 pytorch_model-00003-of-00015.bin
8
+ 4b99c0b586ceed6d51b90f10530ffe50887ca35266f259f079cd207959afc5a5 pytorch_model-00004-of-00015.bin
9
+ b120b9326676b2d9e4f582b28ab7d1e64ed9baa77792999b9e587c7928cd05b7 pytorch_model-00005-of-00015.bin
10
+ 1ae9cb0b19ab4b24cba1f27210d7da74fab6b515b2c682c0ae7915e8b9068a50 pytorch_model-00006-of-00015.bin
11
+ 9f41d3b6914ef195710aa00e26fd51d78814389e69b8d2e81cae38a4f75450af pytorch_model-00007-of-00015.bin
12
+ 8c7c9627b1c8cf54182c1cb4ad091ce7475d5576d2817e93d383d00459b535d2 pytorch_model-00008-of-00015.bin
13
+ 5ad98b6251d0e2197466af1190320a22b7f13d553df4b35a1435d6663496b77f pytorch_model-00009-of-00015.bin
14
+ 0cd1c518d0c216e79b0fbac1d34440e1e9178d0f0c41cd149366413ddc570dd1 pytorch_model-00010-of-00015.bin
15
+ 8b38b400e5dea28d1ccc557edc580156bd2248fd11e2115d5181920f2020ad34 pytorch_model-00011-of-00015.bin
16
+ 9f5e94bc69c5a574f935e5ca837cec9356f2764261cf830261c8e5a52de18bc8 pytorch_model-00012-of-00015.bin
17
+ bd58d7211c44baf2eef3d436c1450ce938a3d33e33fbab1169cb655e09f7fe90 pytorch_model-00013-of-00015.bin
18
+ b82a7e14cd5f0fa871ddae6c3b14c08587f905c61e0e23d181b51752d7f76f55 pytorch_model-00014-of-00015.bin
19
+ f3ba33e8a67917d568c1e43f6f92038c3dd147c6e4b0a3da2069c4144efd18c1 pytorch_model-00015-of-00015.bin
20
+ 9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347 tokenizer.model
llama2_license.txt ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ LLAMA 2 COMMUNITY LICENSE AGREEMENT
2
+ Llama 2 Version Release Date: July 18, 2023
3
+
4
+ "Agreement" means the terms and conditions for use, reproduction, distribution and
5
+ modification of the Llama Materials set forth herein.
6
+
7
+ "Documentation" means the specifications, manuals and documentation
8
+ accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-
9
+ libraries/llama-downloads/.
10
+
11
+ "Licensee" or "you" means you, or your employer or any other person or entity (if
12
+ you are entering into this Agreement on such person or entity's behalf), of the age
13
+ required under applicable laws, rules or regulations to provide legal consent and that
14
+ has legal authority to bind your employer or such other person or entity if you are
15
+ entering in this Agreement on their behalf.
16
+
17
+ "Llama 2" means the foundational large language models and software and
18
+ algorithms, including machine-learning model code, trained model weights,
19
+ inference-enabling code, training-enabling code, fine-tuning enabling code and other
20
+ elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-
21
+ libraries/llama-downloads/.
22
+
23
+ "Llama Materials" means, collectively, Meta's proprietary Llama 2 and
24
+ Documentation (and any portion thereof) made available under this Agreement.
25
+
26
+ "Meta" or "we" means Meta Platforms Ireland Limited (if you are located in or, if you
27
+ are an entity, your principal place of business is in the EEA or Switzerland) and Meta
28
+ Platforms, Inc. (if you are located outside of the EEA or Switzerland).
29
+
30
+ By clicking "I Accept" below or by using or distributing any portion or element of the
31
+ Llama Materials, you agree to be bound by this Agreement.
32
+
33
+ 1. License Rights and Redistribution.
34
+
35
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-
36
+ transferable and royalty-free limited license under Meta's intellectual property or
37
+ other rights owned by Meta embodied in the Llama Materials to use, reproduce,
38
+ distribute, copy, create derivative works of, and make modifications to the Llama
39
+ Materials.
40
+
41
+ b. Redistribution and Use.
42
+
43
+ i. If you distribute or make the Llama Materials, or any derivative works
44
+ thereof, available to a third party, you shall provide a copy of this Agreement to such
45
+ third party.
46
+ ii. If you receive Llama Materials, or any derivative works thereof, from
47
+ a Licensee as part of an integrated end user product, then Section 2 of this
48
+ Agreement will not apply to you.
49
+
50
+ iii. You must retain in all copies of the Llama Materials that you
51
+ distribute the following attribution notice within a "Notice" text file distributed as a
52
+ part of such copies: "Llama 2 is licensed under the LLAMA 2 Community License,
53
+ Copyright (c) Meta Platforms, Inc. All Rights Reserved."
54
+
55
+ iv. Your use of the Llama Materials must comply with applicable laws
56
+ and regulations (including trade compliance laws and regulations) and adhere to the
57
+ Acceptable Use Policy for the Llama Materials (available at
58
+ https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into
59
+ this Agreement.
60
+
61
+ v. You will not use the Llama Materials or any output or results of the
62
+ Llama Materials to improve any other large language model (excluding Llama 2 or
63
+ derivative works thereof).
64
+
65
+ 2. Additional Commercial Terms. If, on the Llama 2 version release date, the
66
+ monthly active users of the products or services made available by or for Licensee,
67
+ or Licensee's affiliates, is greater than 700 million monthly active users in the
68
+ preceding calendar month, you must request a license from Meta, which Meta may
69
+ grant to you in its sole discretion, and you are not authorized to exercise any of the
70
+ rights under this Agreement unless or until Meta otherwise expressly grants you
71
+ such rights.
72
+
73
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE
74
+ LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE
75
+ PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
76
+ EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY
77
+ WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR
78
+ FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE
79
+ FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING
80
+ THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR
81
+ USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
82
+
83
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE
84
+ LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT,
85
+ NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS
86
+ AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL,
87
+ CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN
88
+ IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF
89
+ ANY OF THE FOREGOING.
90
+
91
+ 5. Intellectual Property.
92
+
93
+ a. No trademark licenses are granted under this Agreement, and in
94
+ connection with the Llama Materials, neither Meta nor Licensee may use any name
95
+ or mark owned by or associated with the other or any of its affiliates, except as
96
+ required for reasonable and customary use in describing and redistributing the
97
+ Llama Materials.
98
+
99
+ b. Subject to Meta's ownership of Llama Materials and derivatives made by or
100
+ for Meta, with respect to any derivative works and modifications of the Llama
101
+ Materials that are made by you, as between you and Meta, you are and will be the
102
+ owner of such derivative works and modifications.
103
+
104
+ c. If you institute litigation or other proceedings against Meta or any entity
105
+ (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama
106
+ Materials or Llama 2 outputs or results, or any portion of any of the foregoing,
107
+ constitutes infringement of intellectual property or other rights owned or licensable
108
+ by you, then any licenses granted to you under this Agreement shall terminate as of
109
+ the date such litigation or claim is filed or instituted. You will indemnify and hold
110
+ harmless Meta from and against any claim by any third party arising out of or related
111
+ to your use or distribution of the Llama Materials.
112
+
113
+ 6. Term and Termination. The term of this Agreement will commence upon your
114
+ acceptance of this Agreement or access to the Llama Materials and will continue in
115
+ full force and effect until terminated in accordance with the terms and conditions
116
+ herein. Meta may terminate this Agreement if you are in breach of any term or
117
+ condition of this Agreement. Upon termination of this Agreement, you shall delete
118
+ and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the
119
+ termination of this Agreement.
120
+
121
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and
122
+ construed under the laws of the State of California without regard to choice of law
123
+ principles, and the UN Convention on Contracts for the International Sale of Goods
124
+ does not apply to this Agreement. The courts of California shall have exclusive
125
+ jurisdiction of any dispute arising out of this Agreement.
notice.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Llama 2 is licensed under the LLAMA 2 Community License,
2
+ Copyright (c) Meta Platforms, Inc. All Rights Reserved.
output-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab0760a2d203601ae8ff26f96f82ab26ea42f8eb02b109fa2dc2dd91383a1949
3
+ size 8584096696
output-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a3ef729feadf66f8ad4f7f54c8c2d5eb93372e10903c799276ef0d0592c8368
3
+ size 8589726960
output-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af79ccc2d251dda35625bf39aa1ffd496cfc1ad45209500e810b6c6eb9ec30a5
3
+ size 4124572800
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,730 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 137953296384
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00015-of-00015.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00015.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
16
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
17
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
18
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
19
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
20
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
21
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
22
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
23
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
24
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
25
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
26
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
27
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
28
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
29
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
30
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
31
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
32
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
33
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
34
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
35
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
36
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
37
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
38
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
39
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
40
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
41
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
42
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
43
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
44
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
45
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
46
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
47
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
48
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
49
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
50
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
51
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
52
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
53
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
54
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
55
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
56
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
57
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
58
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
59
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
60
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
61
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
62
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
63
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
64
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
65
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
66
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
67
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
68
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
69
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
70
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
71
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
72
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
73
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
74
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
75
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
76
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
77
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
78
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
79
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
80
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
81
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
82
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
83
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
84
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
85
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
86
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
87
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
88
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
89
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
90
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
91
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
92
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
93
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
94
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
95
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
96
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
97
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
98
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
99
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
100
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
101
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
102
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
103
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
104
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
105
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
106
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
107
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
108
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
109
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
110
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
111
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
112
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
113
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
114
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
115
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
116
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
117
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
118
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
119
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
120
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
121
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
122
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
123
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
124
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
125
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
126
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
127
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
128
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
129
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
130
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
131
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
132
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
133
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
134
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
135
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
136
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
137
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
138
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
139
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
140
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
141
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
142
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
143
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
144
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
145
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
146
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
147
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
148
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
149
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
150
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
151
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
152
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
153
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
154
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
155
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
156
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
157
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
158
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
159
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
160
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
161
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
162
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
163
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
164
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
165
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
166
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
167
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
168
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
169
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
170
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
171
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
172
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
173
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
174
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
175
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
176
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
177
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
178
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
179
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
180
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
181
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
182
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
183
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
184
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
185
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
186
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
187
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
188
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
189
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
190
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
191
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
192
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
193
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
194
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
195
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
196
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
197
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
198
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
199
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
200
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
201
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
202
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
203
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
204
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
205
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
206
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
207
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
208
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
209
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
210
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
211
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
212
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
213
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
214
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
215
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
216
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
217
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
218
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
219
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
220
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
221
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
222
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
223
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
224
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
225
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
226
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
227
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
228
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
229
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
230
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
231
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
232
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
233
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
234
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
235
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
236
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
237
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
238
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
239
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
240
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
241
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
242
+ "model.layers.32.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
243
+ "model.layers.32.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
244
+ "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
245
+ "model.layers.32.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
246
+ "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
247
+ "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
248
+ "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
249
+ "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
250
+ "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
251
+ "model.layers.33.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
252
+ "model.layers.33.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
253
+ "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
254
+ "model.layers.33.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
255
+ "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
256
+ "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
257
+ "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
258
+ "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
259
+ "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
260
+ "model.layers.34.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
261
+ "model.layers.34.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
262
+ "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
263
+ "model.layers.34.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
264
+ "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
265
+ "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
266
+ "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
267
+ "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
268
+ "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
269
+ "model.layers.35.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
270
+ "model.layers.35.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
271
+ "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
272
+ "model.layers.35.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
273
+ "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
274
+ "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
275
+ "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
276
+ "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
277
+ "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
278
+ "model.layers.36.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
279
+ "model.layers.36.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
280
+ "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
281
+ "model.layers.36.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
282
+ "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
283
+ "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
284
+ "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
285
+ "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
286
+ "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
287
+ "model.layers.37.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
288
+ "model.layers.37.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
289
+ "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
290
+ "model.layers.37.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
291
+ "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
292
+ "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
293
+ "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
294
+ "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
295
+ "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
296
+ "model.layers.38.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
297
+ "model.layers.38.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
298
+ "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
299
+ "model.layers.38.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
300
+ "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
301
+ "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
302
+ "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
303
+ "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
304
+ "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
305
+ "model.layers.39.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
306
+ "model.layers.39.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
307
+ "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
308
+ "model.layers.39.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
309
+ "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
310
+ "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
311
+ "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
312
+ "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
313
+ "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
314
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
315
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
316
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
317
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
318
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
319
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
320
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
321
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
322
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
323
+ "model.layers.40.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
324
+ "model.layers.40.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
325
+ "model.layers.40.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
326
+ "model.layers.40.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
327
+ "model.layers.40.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
328
+ "model.layers.40.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
329
+ "model.layers.40.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
330
+ "model.layers.40.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
331
+ "model.layers.40.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
332
+ "model.layers.41.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
333
+ "model.layers.41.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
334
+ "model.layers.41.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
335
+ "model.layers.41.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
336
+ "model.layers.41.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
337
+ "model.layers.41.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
338
+ "model.layers.41.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
339
+ "model.layers.41.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
340
+ "model.layers.41.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
341
+ "model.layers.42.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
342
+ "model.layers.42.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
343
+ "model.layers.42.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
344
+ "model.layers.42.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
345
+ "model.layers.42.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
346
+ "model.layers.42.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
347
+ "model.layers.42.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
348
+ "model.layers.42.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
349
+ "model.layers.42.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
350
+ "model.layers.43.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
351
+ "model.layers.43.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
352
+ "model.layers.43.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
353
+ "model.layers.43.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
354
+ "model.layers.43.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
355
+ "model.layers.43.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
356
+ "model.layers.43.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
357
+ "model.layers.43.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
358
+ "model.layers.43.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
359
+ "model.layers.44.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
360
+ "model.layers.44.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
361
+ "model.layers.44.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
362
+ "model.layers.44.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
363
+ "model.layers.44.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
364
+ "model.layers.44.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
365
+ "model.layers.44.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
366
+ "model.layers.44.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
367
+ "model.layers.44.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
368
+ "model.layers.45.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
369
+ "model.layers.45.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
370
+ "model.layers.45.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
371
+ "model.layers.45.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
372
+ "model.layers.45.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
373
+ "model.layers.45.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
374
+ "model.layers.45.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
375
+ "model.layers.45.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
376
+ "model.layers.45.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
377
+ "model.layers.46.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
378
+ "model.layers.46.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
379
+ "model.layers.46.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
380
+ "model.layers.46.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
381
+ "model.layers.46.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
382
+ "model.layers.46.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
383
+ "model.layers.46.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
384
+ "model.layers.46.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
385
+ "model.layers.46.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
386
+ "model.layers.47.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
387
+ "model.layers.47.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
388
+ "model.layers.47.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
389
+ "model.layers.47.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
390
+ "model.layers.47.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
391
+ "model.layers.47.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
392
+ "model.layers.47.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
393
+ "model.layers.47.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
394
+ "model.layers.47.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
395
+ "model.layers.48.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
396
+ "model.layers.48.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
397
+ "model.layers.48.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
398
+ "model.layers.48.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
399
+ "model.layers.48.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
400
+ "model.layers.48.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
401
+ "model.layers.48.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
402
+ "model.layers.48.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
403
+ "model.layers.48.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
404
+ "model.layers.49.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
405
+ "model.layers.49.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
406
+ "model.layers.49.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
407
+ "model.layers.49.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
408
+ "model.layers.49.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
409
+ "model.layers.49.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
410
+ "model.layers.49.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
411
+ "model.layers.49.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
412
+ "model.layers.49.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
413
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
414
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
415
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
416
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
417
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
418
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
419
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
420
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
421
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
422
+ "model.layers.50.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
423
+ "model.layers.50.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
424
+ "model.layers.50.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
425
+ "model.layers.50.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
426
+ "model.layers.50.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
427
+ "model.layers.50.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
428
+ "model.layers.50.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
429
+ "model.layers.50.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
430
+ "model.layers.50.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
431
+ "model.layers.51.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
432
+ "model.layers.51.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
433
+ "model.layers.51.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
434
+ "model.layers.51.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
435
+ "model.layers.51.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
436
+ "model.layers.51.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
437
+ "model.layers.51.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
438
+ "model.layers.51.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
439
+ "model.layers.51.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
440
+ "model.layers.52.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
441
+ "model.layers.52.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
442
+ "model.layers.52.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
443
+ "model.layers.52.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
444
+ "model.layers.52.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
445
+ "model.layers.52.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
446
+ "model.layers.52.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
447
+ "model.layers.52.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
448
+ "model.layers.52.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
449
+ "model.layers.53.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
450
+ "model.layers.53.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
451
+ "model.layers.53.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
452
+ "model.layers.53.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
453
+ "model.layers.53.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
454
+ "model.layers.53.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
455
+ "model.layers.53.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
456
+ "model.layers.53.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
457
+ "model.layers.53.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
458
+ "model.layers.54.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
459
+ "model.layers.54.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
460
+ "model.layers.54.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
461
+ "model.layers.54.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
462
+ "model.layers.54.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
463
+ "model.layers.54.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
464
+ "model.layers.54.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
465
+ "model.layers.54.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
466
+ "model.layers.54.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
467
+ "model.layers.55.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
468
+ "model.layers.55.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
469
+ "model.layers.55.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
470
+ "model.layers.55.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
471
+ "model.layers.55.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
472
+ "model.layers.55.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
473
+ "model.layers.55.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
474
+ "model.layers.55.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
475
+ "model.layers.55.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
476
+ "model.layers.56.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
477
+ "model.layers.56.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
478
+ "model.layers.56.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
479
+ "model.layers.56.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
480
+ "model.layers.56.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
481
+ "model.layers.56.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
482
+ "model.layers.56.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
483
+ "model.layers.56.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
484
+ "model.layers.56.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
485
+ "model.layers.57.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
486
+ "model.layers.57.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
487
+ "model.layers.57.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
488
+ "model.layers.57.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
489
+ "model.layers.57.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
490
+ "model.layers.57.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
491
+ "model.layers.57.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
492
+ "model.layers.57.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
493
+ "model.layers.57.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
494
+ "model.layers.58.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
495
+ "model.layers.58.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
496
+ "model.layers.58.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
497
+ "model.layers.58.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
498
+ "model.layers.58.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
499
+ "model.layers.58.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
500
+ "model.layers.58.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
501
+ "model.layers.58.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
502
+ "model.layers.58.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
503
+ "model.layers.59.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
504
+ "model.layers.59.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
505
+ "model.layers.59.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
506
+ "model.layers.59.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
507
+ "model.layers.59.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
508
+ "model.layers.59.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
509
+ "model.layers.59.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
510
+ "model.layers.59.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
511
+ "model.layers.59.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
512
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
513
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
514
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
515
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
516
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
517
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
518
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
519
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
520
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
521
+ "model.layers.60.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
522
+ "model.layers.60.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
523
+ "model.layers.60.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
524
+ "model.layers.60.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
525
+ "model.layers.60.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
526
+ "model.layers.60.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
527
+ "model.layers.60.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
528
+ "model.layers.60.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
529
+ "model.layers.60.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
530
+ "model.layers.61.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
531
+ "model.layers.61.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
532
+ "model.layers.61.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
533
+ "model.layers.61.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
534
+ "model.layers.61.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
535
+ "model.layers.61.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
536
+ "model.layers.61.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
537
+ "model.layers.61.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
538
+ "model.layers.61.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
539
+ "model.layers.62.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
540
+ "model.layers.62.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
541
+ "model.layers.62.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
542
+ "model.layers.62.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
543
+ "model.layers.62.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
544
+ "model.layers.62.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
545
+ "model.layers.62.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
546
+ "model.layers.62.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
547
+ "model.layers.62.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
548
+ "model.layers.63.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
549
+ "model.layers.63.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
550
+ "model.layers.63.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
551
+ "model.layers.63.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
552
+ "model.layers.63.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
553
+ "model.layers.63.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
554
+ "model.layers.63.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
555
+ "model.layers.63.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
556
+ "model.layers.63.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
557
+ "model.layers.64.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
558
+ "model.layers.64.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
559
+ "model.layers.64.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
560
+ "model.layers.64.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
561
+ "model.layers.64.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
562
+ "model.layers.64.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
563
+ "model.layers.64.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
564
+ "model.layers.64.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
565
+ "model.layers.64.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
566
+ "model.layers.65.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
567
+ "model.layers.65.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
568
+ "model.layers.65.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
569
+ "model.layers.65.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
570
+ "model.layers.65.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
571
+ "model.layers.65.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
572
+ "model.layers.65.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
573
+ "model.layers.65.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
574
+ "model.layers.65.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
575
+ "model.layers.66.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
576
+ "model.layers.66.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
577
+ "model.layers.66.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
578
+ "model.layers.66.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
579
+ "model.layers.66.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
580
+ "model.layers.66.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
581
+ "model.layers.66.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
582
+ "model.layers.66.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
583
+ "model.layers.66.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
584
+ "model.layers.67.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
585
+ "model.layers.67.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
586
+ "model.layers.67.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
587
+ "model.layers.67.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
588
+ "model.layers.67.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
589
+ "model.layers.67.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
590
+ "model.layers.67.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
591
+ "model.layers.67.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
592
+ "model.layers.67.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
593
+ "model.layers.68.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
594
+ "model.layers.68.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
595
+ "model.layers.68.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
596
+ "model.layers.68.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
597
+ "model.layers.68.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
598
+ "model.layers.68.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
599
+ "model.layers.68.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
600
+ "model.layers.68.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
601
+ "model.layers.68.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
602
+ "model.layers.69.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
603
+ "model.layers.69.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
604
+ "model.layers.69.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
605
+ "model.layers.69.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
606
+ "model.layers.69.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
607
+ "model.layers.69.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
608
+ "model.layers.69.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
609
+ "model.layers.69.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
610
+ "model.layers.69.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
611
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
612
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
613
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
614
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
615
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
616
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
617
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
618
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
619
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
620
+ "model.layers.70.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
621
+ "model.layers.70.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
622
+ "model.layers.70.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
623
+ "model.layers.70.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
624
+ "model.layers.70.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
625
+ "model.layers.70.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
626
+ "model.layers.70.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
627
+ "model.layers.70.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
628
+ "model.layers.70.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
629
+ "model.layers.71.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
630
+ "model.layers.71.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
631
+ "model.layers.71.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
632
+ "model.layers.71.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
633
+ "model.layers.71.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
634
+ "model.layers.71.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
635
+ "model.layers.71.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
636
+ "model.layers.71.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
637
+ "model.layers.71.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
638
+ "model.layers.72.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
639
+ "model.layers.72.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
640
+ "model.layers.72.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
641
+ "model.layers.72.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
642
+ "model.layers.72.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
643
+ "model.layers.72.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
644
+ "model.layers.72.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
645
+ "model.layers.72.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
646
+ "model.layers.72.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
647
+ "model.layers.73.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
648
+ "model.layers.73.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
649
+ "model.layers.73.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
650
+ "model.layers.73.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
651
+ "model.layers.73.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
652
+ "model.layers.73.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
653
+ "model.layers.73.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
654
+ "model.layers.73.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
655
+ "model.layers.73.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
656
+ "model.layers.74.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
657
+ "model.layers.74.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
658
+ "model.layers.74.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
659
+ "model.layers.74.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
660
+ "model.layers.74.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
661
+ "model.layers.74.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
662
+ "model.layers.74.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
663
+ "model.layers.74.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
664
+ "model.layers.74.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
665
+ "model.layers.75.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
666
+ "model.layers.75.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
667
+ "model.layers.75.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
668
+ "model.layers.75.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
669
+ "model.layers.75.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
670
+ "model.layers.75.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
671
+ "model.layers.75.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
672
+ "model.layers.75.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
673
+ "model.layers.75.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
674
+ "model.layers.76.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
675
+ "model.layers.76.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
676
+ "model.layers.76.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
677
+ "model.layers.76.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
678
+ "model.layers.76.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
679
+ "model.layers.76.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
680
+ "model.layers.76.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
681
+ "model.layers.76.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
682
+ "model.layers.76.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
683
+ "model.layers.77.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
684
+ "model.layers.77.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
685
+ "model.layers.77.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
686
+ "model.layers.77.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
687
+ "model.layers.77.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
688
+ "model.layers.77.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
689
+ "model.layers.77.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
690
+ "model.layers.77.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
691
+ "model.layers.77.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
692
+ "model.layers.78.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
693
+ "model.layers.78.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
694
+ "model.layers.78.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
695
+ "model.layers.78.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
696
+ "model.layers.78.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
697
+ "model.layers.78.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
698
+ "model.layers.78.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
699
+ "model.layers.78.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
700
+ "model.layers.78.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
701
+ "model.layers.79.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
702
+ "model.layers.79.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
703
+ "model.layers.79.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
704
+ "model.layers.79.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
705
+ "model.layers.79.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
706
+ "model.layers.79.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
707
+ "model.layers.79.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
708
+ "model.layers.79.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
709
+ "model.layers.79.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
710
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
711
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
712
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
713
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
714
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
715
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
716
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
717
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
718
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
719
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
720
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
721
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
722
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
723
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
724
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
725
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
726
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
727
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
728
+ "model.norm.weight": "pytorch_model-00014-of-00015.bin"
729
+ }
730
+ }
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": {"content": "<s>", "lstrip": false, "normalized": true, "rstrip": false, "single_word": false}, "eos_token": {"content": "</s>", "lstrip": false, "normalized": true, "rstrip": false, "single_word": false}, "unk_token": {"content": "<unk>", "lstrip": false, "normalized": true, "rstrip": false, "single_word": false}}
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token":true,
3
+ "add_eos_token":false,
4
+ "model_max_length":2048,
5
+ "pad_token":null,
6
+ "sp_model_kwargs":{
7
+
8
+ },
9
+ "tokenizer_class":"LlamaTokenizer",
10
+ "clean_up_tokenization_spaces":false,
11
+ "bos_token":{
12
+ "__type":"AddedToken",
13
+ "content":"<s>",
14
+ "lstrip":false,
15
+ "normalized":true,
16
+ "rstrip":false,
17
+ "single_word":false
18
+ },
19
+ "eos_token":{
20
+ "__type":"AddedToken",
21
+ "content":"</s>",
22
+ "lstrip":false,
23
+ "normalized":true,
24
+ "rstrip":false,
25
+ "single_word":false
26
+ },
27
+ "unk_token":{
28
+ "__type":"AddedToken",
29
+ "content":"<unk>",
30
+ "lstrip":false,
31
+ "normalized":true,
32
+ "rstrip":false,
33
+ "single_word":false
34
+ }
35
+ }