LoneStriker
commited on
Commit
•
eccd9a6
1
Parent(s):
280d718
Upload folder using huggingface_hub
Browse files- LICENSE.md +138 -0
- README.md +136 -0
- config.json +25 -0
- generation_config.json +6 -0
- huggingface-metadata.txt +20 -0
- llama2_license.txt +125 -0
- notice.txt +2 -0
- output-00001-of-00004.safetensors +3 -0
- output-00002-of-00004.safetensors +3 -0
- output-00003-of-00004.safetensors +3 -0
- output-00004-of-00004.safetensors +3 -0
- pytorch_model.bin.index.json +730 -0
- special_tokens_map.json +1 -0
- tokenizer.model +3 -0
- tokenizer_config.json +35 -0
LICENSE.md
ADDED
@@ -0,0 +1,138 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
AI2 ImpACT License – Low Risk Artifacts
|
2 |
+
==========================================
|
3 |
+
|
4 |
+
### Version 1.0
|
5 |
+
|
6 |
+
This AI2 ImpACT License for Low Risk Artifacts (“LR Agreement”) is between The Allen Institute for Artificial Intelligence (“AI2”) and any individual or entity who purchases, downloads, installs, logs into, accesses or otherwise uses any ”low risk” Artifacts that refer to this LR Agreement, including any other person or entity that an individual purports to represent, be the agent of, or otherwise act on behalf of (collectively, “You”).
|
7 |
+
|
8 |
+
By clicking or taking similar action to accept this LR Agreement, or by accessing or using any Artifacts licensed under this LR Agreement, You agree to all the terms and conditions herein. You represent and warrant that You are at least 18 years old and have the full legal right and authority to enter into this LR Agreement and bind any employer or entity that You are acting on behalf of. If You do not agree or have the requisite authority, You have no right to access or use any Artifacts and must immediately cease any existing use.
|
9 |
+
|
10 |
+
A human-friendly summary of the legal text can be found [here](http://allenai.org/impact-license).
|
11 |
+
|
12 |
+
1. **DEFINITIONS**.
|
13 |
+
|
14 |
+
1. “**Artifact(s)**” means collectively and individually, anything used to build or apply a Model that is licensed by AI2 pursuant to this LR Agreement, such as weights or Data and any Derivatives of the original Artifact.
|
15 |
+
|
16 |
+
2. “**Code**” means a text listing of commands to be compiled or assembled into an executable computer program.
|
17 |
+
|
18 |
+
3. “**Data**” means the datasets created and/or compiled by AI2 to pretrain, train or fine-tune a Model.
|
19 |
+
|
20 |
+
4. “**Data Derivatives**” means
|
21 |
+
|
22 |
+
1. all modifications of the Data, and/or
|
23 |
+
|
24 |
+
2. all derivative works created from the Data that are considered copyrighted works under U.S. copyright laws.
|
25 |
+
|
26 |
+
5. “**Derivatives**” means anything that is based on or derived from any Artifact within the meaning of applicable U.S. copyright laws; including specifically and without limitation Model Derivatives and Data Derivatives
|
27 |
+
|
28 |
+
6. “**Distribute**” or “**Distribution**” means any transmission, reproduction, publication, public display, or other sharing of the Artifacts to a Third party by any means, including as a hosted service made available by electronic or other remote means - e.g. API-based or web access.
|
29 |
+
|
30 |
+
7. “**Model**” means the algorithm, weights and/or parameters used to produce the desired outcome, whether a machine learning algorithm or a deeper neural network.
|
31 |
+
|
32 |
+
8. “**Model Derivatives**” means
|
33 |
+
|
34 |
+
1. all modifications to the Model; and/or
|
35 |
+
|
36 |
+
2. any other model which is created or initialized by transfer of patterns of the weights, parameters, activations or output of the Model, to the other model, in order to cause the other model to perform similarly to the Model, including - but not limited to - distillation methods entailing the use of intermediate data representations or methods based on the generation of synthetic data by the Model for training the other model.
|
37 |
+
|
38 |
+
9. “**Term**” means the period of time starting from the date You access or use any Artifacts until this LR Agreement is terminated in accordance with **Section 4**.
|
39 |
+
|
40 |
+
10. “**Third Party**” means any party other than You or AI2.
|
41 |
+
|
42 |
+
11. “**Use-Based Restrictions**” means the specified restricted use cases set forth in **Exhibit A**.
|
43 |
+
|
44 |
+
2. **LICENSE**. Subject to Your compliance with the requirements in this LR Agreement together with all applicable laws, AI2 grants to You a worldwide, non-exclusive, non-transferable, royalty-free license to use, install, and create Derivatives strictly in accordance with the requirements and restrictions as set forth below.
|
45 |
+
|
46 |
+
1. **Distribution**. You may Distribute any Artifacts and Your Derivatives, provided that:
|
47 |
+
|
48 |
+
1. You flow down and include the Use-Based Restrictions as an enforceable provision in any type of license or legal agreement governing downstream use and/or Distribution;
|
49 |
+
|
50 |
+
2. You cause any of Your Derivatives to carry a prominent notice stating that You changed the original Artifact and how the Artifact was modified; and
|
51 |
+
|
52 |
+
3. You retain all applicable copyright, patent, trademark, and attribution notices included with the Artifact.
|
53 |
+
|
54 |
+
2. **Attribution**. Together with any copies of the Artifacts or Derivatives that You Distribute, You must provide (i) a copy of this LR Agreement; and (ii) the following attribution notice: _“\[Artifact\] is licensed under the AI2 ImpACT License for Medium Risk Artifacts, © \[year\] The Allen Institute for Artificial Intelligence.”_
|
55 |
+
|
56 |
+
3. **Derivative Impact Reports**. AI2 seeks to encourage transparency around Derivatives through the use of Derivative Impact Reports, available [here](http://allenai.org/impact-license). Before releasing a Model Derivative or Data Derivative, You will complete a Derivative Impact Report and will publish, post, or make available the results of the Derivative Impact Report to the general public without imposing any direct or indirect restrictions, conditions or barriers to access, such as a paywall, fee, subscription, account, or requirement to submit personal information. You agree that AI2 may publish, post, or make available the information in Your Derivative Impact Report for review by the general public.
|
57 |
+
|
58 |
+
1. You agree to maintain the transparency and accuracy of information regarding Your Derivatives in good faith and will update the Derivative Impact Report whenever a material change has occurred in any of the reporting categories therein.
|
59 |
+
|
60 |
+
2. You acknowledge that Derivative Impact Reports are not intended to penalize any good faith disclosures about Derivatives. Accordingly, if You initiate or participate in any lawsuit or other legal action against a Third Party based on information in such Third Party’s Derivative Impact Report, then this LR Agreement will terminate immediately as of the date such lawsuit or legal action is filed or commenced.
|
61 |
+
|
62 |
+
4. **Use-Based Restrictions**. You will not use any Artifacts or Derivatives in connection with any Use-Based Restrictions, including without limitation, creating any content with, finetuning, updating, running, training, evaluating and/or reparametrizing a Model.
|
63 |
+
|
64 |
+
5. **No Circumvention**. You acknowledge that the purpose of the license granted herein is to facilitate transparency and responsible development of AI technology. Accordingly, You will not directly or indirectly circumvent the requirements in this **Section 2**, nor assist or enable any Third Party to do so.
|
65 |
+
|
66 |
+
6. **Revocable License**. The license granted to You is revocable. To the maximum extent permitted by law, AI2 reserves the right to suspend, restrict, or terminate (remotely or otherwise) Your access, use or Distribution of any Artifacts and/or Derivatives not expressly permitted herein.
|
67 |
+
|
68 |
+
3. **INTELLECTUAL PROPERTY RIGHTS.**
|
69 |
+
|
70 |
+
1. AI2 and its licensors retain all right, title and interest in and to the Artifacts, including all patent, copyright, trademark, and trade secret rights, whether such rights are registered or unregistered, and wherever in the world those rights may exist. You will not commit any act or omission that contradicts or is inconsistent with AI2’s rights, nor permit or induce any Third Party to do the same. Other than the license granted in **Section 2** and as provided in **Section 3(b)**, all rights are expressly reserved by AI2.
|
71 |
+
|
72 |
+
2. Subject to Your compliance with this LR Agreement, You will own any Derivatives You create. However, if Your use or Distribution of any Derivative is in breach of this LR Agreement, You will transfer and assign all right, title, and interest in and to such Derivative to AI2 and execute any related documentation as required by AI2.
|
73 |
+
|
74 |
+
4. **TERM AND TERMINATION**. AI2 may terminate this LR Agreement by written notice at any time if You materially breach any of Your obligations herein and fail to cure to AI2’s satisfaction within thirty (30) days after such notice.
|
75 |
+
|
76 |
+
1. Upon any termination of this LR Agreement, the license granted in **Section 2** will automatically terminate as of the termination date and You will:
|
77 |
+
|
78 |
+
1. Cease all use of the Artifacts and immediately delete all copies in Your possession or control; and
|
79 |
+
|
80 |
+
2. Cease all use and Distribution of any Derivatives and promptly provide AI2 with any other information regarding Your Derivatives as requested by AI2, including any documentation to assign Your Derivatives to AI2 pursuant to **Section 3(b)**.
|
81 |
+
|
82 |
+
2. Additionally, if AI2 terminates this LR Agreement due to Your breach of **Section 2(d)** (Use-Based Restrictions), You authorize AI2 to post a prominent notice stating that You violated the Use-Based Restrictions of this LR Agreement and that Your rights to use the Artifacts and Derivatives were terminated by AI2.
|
83 |
+
|
84 |
+
3. All terms and provisions that are reasonably interpreted to survive termination of this LR Agreement to fulfill its essential purpose will survive, including **Sections 3-8**.
|
85 |
+
|
86 |
+
5. **DISCLAIMER**. AI2 PROVIDES THE ARTIFACTS ON AN “AS-IS” BASIS, AND AI2 DISCLAIMS ALL EXPRESS AND IMPLIED WARRANTIES OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING AN ARTIFACT, OR CREATING OR DISTRIBUTING ANY DERIVATIVES, AND YOU ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR EXERCISE OF PERMISSIONS UNDER THIS LR AGREEMENT.
|
87 |
+
|
88 |
+
6. **LIMITATION OF LIABILITY**. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAWS, IN NO EVENT WILL AI2 BE LIABLE TO YOU OR ANY THIRD PARTY FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER (INCLUDING BUT NOT LIMITED TO DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE, INTEROPERABILITY OR MALFUNCTION, OR ANY OTHER LOSS) ARISING FROM OR RELATED TO THIS LR AGREEMENT, INCLUDING WITHOUT LIMITATION, ANY USE OR INABILITY TO USE ANY ARTIFACTS , WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR ANY OTHER LEGAL THEORY, EVEN IF YOU OR ANY THIRD PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
|
89 |
+
|
90 |
+
7. **INDEMNIFICATION FOR THIRD-PARTY CLAIMS**. You will defend and indemnify AI2 and its officers, directors, employees, and agents from and against any and all Third Party claims, lawsuits, and proceedings that arise or result from:
|
91 |
+
|
92 |
+
1. Your material, uncured breach of this LR Agreement; and/or
|
93 |
+
|
94 |
+
2. Your failure or alleged failure to comply with applicable laws or any violation of a Third Party’s rights in connection with Your use of the Artifacts or Your Derivatives.
|
95 |
+
|
96 |
+
8. **MISCELLANEOUS**.
|
97 |
+
|
98 |
+
1. **Consideration**. The Artifacts are provided to You by AI2 subject to Your continued compliance with the terms and conditions of this LR Agreement.
|
99 |
+
|
100 |
+
2. **Relationship**. This LR Agreement and the parties’ relationship hereunder is non-exclusive, and neither party is restricted or limited in any way from entering into the same or similar arrangements with Third Parties. Nothing in this LR Agreement will be deemed or construed to create any employment, franchise, joint venture, partnership, agency or other such similar relationship between You and AI2.
|
101 |
+
|
102 |
+
3. **No Waiver; Equitable Remedies**. Any delay or failure of AI2 to enforce its rights or any provision of this LR Agreement will not be a waiver unless specifically issued in writing by AI2. Any term that is held to be invalid or enforceable will not affect any other terms in this LR Agreement, which will remain in full force and effect. You acknowledge that if You breach this LR Agreement, it may cause irreparable harm to AI2, and You agree that AI2 may seek injunctive relief against You in addition to any other legal and equitable remedies.
|
103 |
+
|
104 |
+
4. **Export Control**: You will not violate any applicable U.S. and non-U.S. export control and trade sanctions laws (“Export Laws”) or directly or indirectly export, re-export, provide, or otherwise transfer any Artifacts or Derivatives
|
105 |
+
|
106 |
+
1. to any individual, entity, or country prohibited by Export Laws;
|
107 |
+
|
108 |
+
2. to anyone on U.S. or non-U.S. government restricted parties lists; or
|
109 |
+
|
110 |
+
3. for any purpose prohibited by Export Laws, including nuclear, chemical or biological weapons, or missile technology applications.
|
111 |
+
|
112 |
+
5. **Governing Law**. This LR Agreement will be governed by the laws of the State of Washington, U.S.A. without regard to its choice of laws or conflict of laws rules.
|
113 |
+
|
114 |
+
6. **Entire Agreement**. Except as otherwise specifically set forth herein, this LR Agreement and any documents or policies that are incorporated or made part of this LR Agreement by reference contain the entire agreement between You and AI2 regarding the subject matter herein.
|
115 |
+
|
116 |
+
7. **Modifications**. AI2 may revise and update the terms of this LR Agreement from time to time and will post such updates to its website at [http://allenai.org/impact-license](http://allenai.org/impact-license). UNLESS OTHERWISE STATED IN THE AMENDED VERSION OF THIS LR AGREEMENT, ANY CHANGES TO THIS LR AGREEMENT WILL APPLY IMMEDIATELY UPON POSTING. While AI2 is not obligated to provide You with notice of any changes, any amendments to this LR Agreement will not apply retroactively to events that occurred prior to such changes. Your continued use or Distribution of the Artifact(s) and/or Your Derivatives will constitute Your agreement to the terms of the updated LR Agreement.
|
117 |
+
|
118 |
+
For any questions regarding this LR Agreement, please contact [ai2impact@allenai.org](mailto:ai2impact@allenai.org).
|
119 |
+
|
120 |
+
**EXHIBIT A**
|
121 |
+
|
122 |
+
**USE-BASED RESTRICTIONS**
|
123 |
+
|
124 |
+
1. **EXPECTATIONS**. AI2 expects that You will not use, or cause or assist others to use, any Artifacts or Derivatives in connection with any academic dishonesty, including submitting any informational content or output of a Model as Your own work in any academic setting.
|
125 |
+
|
126 |
+
2. **RESTRICTIONS**. You will not, and will not permit, assist, or cause any Third Party to use, modify, copy, reproduce, incorporate, create Derivatives of, or Distribute any Artifacts or Your Derivatives, in whole or in part, for:
|
127 |
+
|
128 |
+
1. military weapons purposes or in the service of nuclear proliferation or nuclear weapons technology;
|
129 |
+
|
130 |
+
2. purposes of military surveillance, including any research or development relating to military surveillance;
|
131 |
+
|
132 |
+
3. purposes of generating or disseminating information or content, in any context (e.g. posts, articles, tweets, chatbots or other kinds of automated bots) without expressly and intelligibly disclaiming that the text is machine generated;
|
133 |
+
|
134 |
+
4. purposes of ‘real time’ remote biometric processing or identification systems in publicly accessible spaces for the purpose of law enforcement;
|
135 |
+
|
136 |
+
5. fully automated decision-making without a human in the loop; and/or
|
137 |
+
|
138 |
+
6. purposes of the predictive administration of justice, law enforcement, immigration, or asylum processes, such as predicting an individual will commit fraud/crime (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).
|
README.md
ADDED
@@ -0,0 +1,136 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
model-index:
|
3 |
+
- name: tulu-2-dpo-70b
|
4 |
+
results: []
|
5 |
+
datasets:
|
6 |
+
- HuggingFaceH4/ultrafeedback_binarized
|
7 |
+
- allenai/tulu-v2-sft-mixture
|
8 |
+
language:
|
9 |
+
- en
|
10 |
+
base_model: meta-llama/Llama-2-70b-hf
|
11 |
+
---
|
12 |
+
|
13 |
+
|
14 |
+
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-v2/Tulu%20V2%20banner.png" alt="TuluV2 banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
15 |
+
|
16 |
+
|
17 |
+
# Model Card for Tulu V2 DPO 70B
|
18 |
+
|
19 |
+
Tulu is a series of language models that are trained to act as helpful assistants.
|
20 |
+
Tulu V2 DPO 70B is a fine-tuned version of Llama 2 that was trained on on a mix of publicly available, synthetic and human datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290).
|
21 |
+
This model is a strong alternative to Llama 2 70b Chat.
|
22 |
+
|
23 |
+
For more details, read the paper: [Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
|
24 |
+
](https://arxiv.org/abs/2311.10702).
|
25 |
+
|
26 |
+
|
27 |
+
## Model description
|
28 |
+
|
29 |
+
- **Model type:** The flagship model of a suite of instruction and RLHF tuned chat models on a mix of publicly available, synthetic and human-created datasets.
|
30 |
+
- **Language(s) (NLP):** Primarily English
|
31 |
+
- **License:** [AI2 ImpACT](https://allenai.org/impact-license) Low-risk license.
|
32 |
+
- **Finetuned from model:** [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)
|
33 |
+
|
34 |
+
### Model Sources
|
35 |
+
|
36 |
+
- **Repository:** https://github.com/allenai/open-instruct
|
37 |
+
- **DPO Recipe:** The DPO recipe is from the [Zephyr Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) model
|
38 |
+
- **Model Family:** Other models and the dataset are found in the [Tulu V2 collection](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
|
39 |
+
|
40 |
+
## Performance
|
41 |
+
|
42 |
+
| Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
|
43 |
+
|-------------|-----|----|---------------|--------------|
|
44 |
+
| **Tulu-v2-7b** 🐪 | **7B** | **SFT** | **6.30** | **73.9** |
|
45 |
+
| **Tulu-v2-dpo-7b** 🐪 | **7B** | **DPO** | **6.29** | **85.1** |
|
46 |
+
| **Tulu-v2-13b** 🐪 | **13B** | **SFT** | **6.70** | **78.9** |
|
47 |
+
| **Tulu-v2-dpo-13b** 🐪 | **13B** | **DPO** | **7.00** | **89.5** |
|
48 |
+
| **Tulu-v2-70b** 🐪 | **70B** | **SFT** | **7.49** | **86.6** |
|
49 |
+
| **Tulu-v2-dpo-70b** 🐪 | **70B** | **DPO** | **7.89** | **95.1** |
|
50 |
+
|
51 |
+
## Input Format
|
52 |
+
|
53 |
+
The model is trained to use the following format (note the newlines):
|
54 |
+
```
|
55 |
+
<|user|>
|
56 |
+
Your message here!
|
57 |
+
<|assistant|>
|
58 |
+
```
|
59 |
+
|
60 |
+
For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**
|
61 |
+
|
62 |
+
|
63 |
+
## Intended uses & limitations
|
64 |
+
|
65 |
+
The model was initially fine-tuned on a filtered and preprocessed of the [Tulu V2 mix dataset](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture), which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.
|
66 |
+
We then further aligned the model with a [Jax DPO trainer](https://github.com/hamishivi/EasyLM/blob/main/EasyLM/models/llama/llama_train_dpo.py) built on [EasyLM](https://github.com/young-geng/EasyLM) on the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, which contains 64k prompts and model completions that are ranked by GPT-4.
|
67 |
+
|
68 |
+
|
69 |
+
<!-- You can find the datasets used for training Tulu V2 [here]()
|
70 |
+
|
71 |
+
Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
|
72 |
+
|
73 |
+
```python
|
74 |
+
# Install transformers from source - only needed for versions <= v4.34
|
75 |
+
# pip install git+https://github.com/huggingface/transformers.git
|
76 |
+
# pip install accelerate
|
77 |
+
|
78 |
+
import torch
|
79 |
+
from transformers import pipeline
|
80 |
+
|
81 |
+
pipe = pipeline("text-generation", model="HuggingFaceH4/tulu-2-dpo-70b", torch_dtype=torch.bfloat16, device_map="auto")
|
82 |
+
|
83 |
+
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
|
84 |
+
messages = [
|
85 |
+
{
|
86 |
+
"role": "system",
|
87 |
+
"content": "You are a friendly chatbot who always responds in the style of a pirate",
|
88 |
+
},
|
89 |
+
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
|
90 |
+
]
|
91 |
+
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
92 |
+
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
93 |
+
print(outputs[0]["generated_text"])
|
94 |
+
# <|system|>
|
95 |
+
# You are a friendly chatbot who always responds in the style of a pirate.</s>
|
96 |
+
# <|user|>
|
97 |
+
# How many helicopters can a human eat in one sitting?</s>
|
98 |
+
# <|assistant|>
|
99 |
+
# Ah, me hearty matey! But yer question be a puzzler! A human cannot eat a helicopter in one sitting, as helicopters are not edible. They be made of metal, plastic, and other materials, not food!
|
100 |
+
```-->
|
101 |
+
|
102 |
+
## Bias, Risks, and Limitations
|
103 |
+
|
104 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
105 |
+
|
106 |
+
The Tulu models have not been aligned to generate safe completions within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
|
107 |
+
It is also unknown what the size and composition of the corpus was used to train the base Llama 2 models, however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
|
108 |
+
|
109 |
+
|
110 |
+
### Training hyperparameters
|
111 |
+
|
112 |
+
The following hyperparameters were used during DPO training:
|
113 |
+
- learning_rate: 5e-07
|
114 |
+
- total_train_batch_size: 32
|
115 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
116 |
+
- lr_scheduler_type: linear
|
117 |
+
- lr_scheduler_warmup_ratio: 0.1
|
118 |
+
- num_epochs: 3.0
|
119 |
+
|
120 |
+
|
121 |
+
## Citation
|
122 |
+
|
123 |
+
If you find Tulu 2 is useful in your work, please cite it with:
|
124 |
+
|
125 |
+
```
|
126 |
+
@misc{ivison2023camels,
|
127 |
+
title={Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2},
|
128 |
+
author={Hamish Ivison and Yizhong Wang and Valentina Pyatkin and Nathan Lambert and Matthew Peters and Pradeep Dasigi and Joel Jang and David Wadden and Noah A. Smith and Iz Beltagy and Hannaneh Hajishirzi},
|
129 |
+
year={2023},
|
130 |
+
eprint={2311.10702},
|
131 |
+
archivePrefix={arXiv},
|
132 |
+
primaryClass={cs.CL}
|
133 |
+
}
|
134 |
+
```
|
135 |
+
|
136 |
+
*Model card adapted from [Zephyr Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/README.md)*
|
config.json
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architectures": [
|
3 |
+
"LlamaForCausalLM"
|
4 |
+
],
|
5 |
+
"bos_token_id": 1,
|
6 |
+
"eos_token_id": 2,
|
7 |
+
"hidden_act": "silu",
|
8 |
+
"hidden_size": 8192,
|
9 |
+
"initializer_range": 0.02,
|
10 |
+
"intermediate_size": 28672,
|
11 |
+
"max_position_embeddings": 8192,
|
12 |
+
"model_type": "llama",
|
13 |
+
"num_attention_heads": 64,
|
14 |
+
"num_hidden_layers": 80,
|
15 |
+
"num_key_value_heads": 8,
|
16 |
+
"pretraining_tp": 1,
|
17 |
+
"rms_norm_eps": 1e-05,
|
18 |
+
"rope_scaling": null,
|
19 |
+
"rope_theta": 10000.0,
|
20 |
+
"tie_word_embeddings": false,
|
21 |
+
"torch_dtype": "bfloat16",
|
22 |
+
"transformers_version": "4.33.2",
|
23 |
+
"use_cache": true,
|
24 |
+
"vocab_size": 32000
|
25 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"transformers_version": "4.33.2"
|
6 |
+
}
|
huggingface-metadata.txt
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
url: https://huggingface.co/allenai/tulu-2-dpo-70b
|
2 |
+
branch: main
|
3 |
+
download date: 2023-11-23 09:56:36
|
4 |
+
sha256sum:
|
5 |
+
ef61332cc45d00e0ee90dc55cf4313ac990cc0f35ca9b10b4b8c6a3cde1ba881 pytorch_model-00001-of-00015.bin
|
6 |
+
be0b80c5a3e3f130564bda815da6f92e241fddcd1b4a244b8a8e3aa2efab4a76 pytorch_model-00002-of-00015.bin
|
7 |
+
1b9031ce6e0840ff1c0d4b85bdc21b64288c8fdec1d0a067b61d56c4b8656c83 pytorch_model-00003-of-00015.bin
|
8 |
+
4b99c0b586ceed6d51b90f10530ffe50887ca35266f259f079cd207959afc5a5 pytorch_model-00004-of-00015.bin
|
9 |
+
b120b9326676b2d9e4f582b28ab7d1e64ed9baa77792999b9e587c7928cd05b7 pytorch_model-00005-of-00015.bin
|
10 |
+
1ae9cb0b19ab4b24cba1f27210d7da74fab6b515b2c682c0ae7915e8b9068a50 pytorch_model-00006-of-00015.bin
|
11 |
+
9f41d3b6914ef195710aa00e26fd51d78814389e69b8d2e81cae38a4f75450af pytorch_model-00007-of-00015.bin
|
12 |
+
8c7c9627b1c8cf54182c1cb4ad091ce7475d5576d2817e93d383d00459b535d2 pytorch_model-00008-of-00015.bin
|
13 |
+
5ad98b6251d0e2197466af1190320a22b7f13d553df4b35a1435d6663496b77f pytorch_model-00009-of-00015.bin
|
14 |
+
0cd1c518d0c216e79b0fbac1d34440e1e9178d0f0c41cd149366413ddc570dd1 pytorch_model-00010-of-00015.bin
|
15 |
+
8b38b400e5dea28d1ccc557edc580156bd2248fd11e2115d5181920f2020ad34 pytorch_model-00011-of-00015.bin
|
16 |
+
9f5e94bc69c5a574f935e5ca837cec9356f2764261cf830261c8e5a52de18bc8 pytorch_model-00012-of-00015.bin
|
17 |
+
bd58d7211c44baf2eef3d436c1450ce938a3d33e33fbab1169cb655e09f7fe90 pytorch_model-00013-of-00015.bin
|
18 |
+
b82a7e14cd5f0fa871ddae6c3b14c08587f905c61e0e23d181b51752d7f76f55 pytorch_model-00014-of-00015.bin
|
19 |
+
f3ba33e8a67917d568c1e43f6f92038c3dd147c6e4b0a3da2069c4144efd18c1 pytorch_model-00015-of-00015.bin
|
20 |
+
9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347 tokenizer.model
|
llama2_license.txt
ADDED
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
LLAMA 2 COMMUNITY LICENSE AGREEMENT
|
2 |
+
Llama 2 Version Release Date: July 18, 2023
|
3 |
+
|
4 |
+
"Agreement" means the terms and conditions for use, reproduction, distribution and
|
5 |
+
modification of the Llama Materials set forth herein.
|
6 |
+
|
7 |
+
"Documentation" means the specifications, manuals and documentation
|
8 |
+
accompanying Llama 2 distributed by Meta at ai.meta.com/resources/models-and-
|
9 |
+
libraries/llama-downloads/.
|
10 |
+
|
11 |
+
"Licensee" or "you" means you, or your employer or any other person or entity (if
|
12 |
+
you are entering into this Agreement on such person or entity's behalf), of the age
|
13 |
+
required under applicable laws, rules or regulations to provide legal consent and that
|
14 |
+
has legal authority to bind your employer or such other person or entity if you are
|
15 |
+
entering in this Agreement on their behalf.
|
16 |
+
|
17 |
+
"Llama 2" means the foundational large language models and software and
|
18 |
+
algorithms, including machine-learning model code, trained model weights,
|
19 |
+
inference-enabling code, training-enabling code, fine-tuning enabling code and other
|
20 |
+
elements of the foregoing distributed by Meta at ai.meta.com/resources/models-and-
|
21 |
+
libraries/llama-downloads/.
|
22 |
+
|
23 |
+
"Llama Materials" means, collectively, Meta's proprietary Llama 2 and
|
24 |
+
Documentation (and any portion thereof) made available under this Agreement.
|
25 |
+
|
26 |
+
"Meta" or "we" means Meta Platforms Ireland Limited (if you are located in or, if you
|
27 |
+
are an entity, your principal place of business is in the EEA or Switzerland) and Meta
|
28 |
+
Platforms, Inc. (if you are located outside of the EEA or Switzerland).
|
29 |
+
|
30 |
+
By clicking "I Accept" below or by using or distributing any portion or element of the
|
31 |
+
Llama Materials, you agree to be bound by this Agreement.
|
32 |
+
|
33 |
+
1. License Rights and Redistribution.
|
34 |
+
|
35 |
+
a. Grant of Rights. You are granted a non-exclusive, worldwide, non-
|
36 |
+
transferable and royalty-free limited license under Meta's intellectual property or
|
37 |
+
other rights owned by Meta embodied in the Llama Materials to use, reproduce,
|
38 |
+
distribute, copy, create derivative works of, and make modifications to the Llama
|
39 |
+
Materials.
|
40 |
+
|
41 |
+
b. Redistribution and Use.
|
42 |
+
|
43 |
+
i. If you distribute or make the Llama Materials, or any derivative works
|
44 |
+
thereof, available to a third party, you shall provide a copy of this Agreement to such
|
45 |
+
third party.
|
46 |
+
ii. If you receive Llama Materials, or any derivative works thereof, from
|
47 |
+
a Licensee as part of an integrated end user product, then Section 2 of this
|
48 |
+
Agreement will not apply to you.
|
49 |
+
|
50 |
+
iii. You must retain in all copies of the Llama Materials that you
|
51 |
+
distribute the following attribution notice within a "Notice" text file distributed as a
|
52 |
+
part of such copies: "Llama 2 is licensed under the LLAMA 2 Community License,
|
53 |
+
Copyright (c) Meta Platforms, Inc. All Rights Reserved."
|
54 |
+
|
55 |
+
iv. Your use of the Llama Materials must comply with applicable laws
|
56 |
+
and regulations (including trade compliance laws and regulations) and adhere to the
|
57 |
+
Acceptable Use Policy for the Llama Materials (available at
|
58 |
+
https://ai.meta.com/llama/use-policy), which is hereby incorporated by reference into
|
59 |
+
this Agreement.
|
60 |
+
|
61 |
+
v. You will not use the Llama Materials or any output or results of the
|
62 |
+
Llama Materials to improve any other large language model (excluding Llama 2 or
|
63 |
+
derivative works thereof).
|
64 |
+
|
65 |
+
2. Additional Commercial Terms. If, on the Llama 2 version release date, the
|
66 |
+
monthly active users of the products or services made available by or for Licensee,
|
67 |
+
or Licensee's affiliates, is greater than 700 million monthly active users in the
|
68 |
+
preceding calendar month, you must request a license from Meta, which Meta may
|
69 |
+
grant to you in its sole discretion, and you are not authorized to exercise any of the
|
70 |
+
rights under this Agreement unless or until Meta otherwise expressly grants you
|
71 |
+
such rights.
|
72 |
+
|
73 |
+
3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE
|
74 |
+
LLAMA MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE
|
75 |
+
PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
|
76 |
+
EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY
|
77 |
+
WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR
|
78 |
+
FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE
|
79 |
+
FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING
|
80 |
+
THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR
|
81 |
+
USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
|
82 |
+
|
83 |
+
4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE
|
84 |
+
LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT,
|
85 |
+
NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS
|
86 |
+
AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL,
|
87 |
+
CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN
|
88 |
+
IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF
|
89 |
+
ANY OF THE FOREGOING.
|
90 |
+
|
91 |
+
5. Intellectual Property.
|
92 |
+
|
93 |
+
a. No trademark licenses are granted under this Agreement, and in
|
94 |
+
connection with the Llama Materials, neither Meta nor Licensee may use any name
|
95 |
+
or mark owned by or associated with the other or any of its affiliates, except as
|
96 |
+
required for reasonable and customary use in describing and redistributing the
|
97 |
+
Llama Materials.
|
98 |
+
|
99 |
+
b. Subject to Meta's ownership of Llama Materials and derivatives made by or
|
100 |
+
for Meta, with respect to any derivative works and modifications of the Llama
|
101 |
+
Materials that are made by you, as between you and Meta, you are and will be the
|
102 |
+
owner of such derivative works and modifications.
|
103 |
+
|
104 |
+
c. If you institute litigation or other proceedings against Meta or any entity
|
105 |
+
(including a cross-claim or counterclaim in a lawsuit) alleging that the Llama
|
106 |
+
Materials or Llama 2 outputs or results, or any portion of any of the foregoing,
|
107 |
+
constitutes infringement of intellectual property or other rights owned or licensable
|
108 |
+
by you, then any licenses granted to you under this Agreement shall terminate as of
|
109 |
+
the date such litigation or claim is filed or instituted. You will indemnify and hold
|
110 |
+
harmless Meta from and against any claim by any third party arising out of or related
|
111 |
+
to your use or distribution of the Llama Materials.
|
112 |
+
|
113 |
+
6. Term and Termination. The term of this Agreement will commence upon your
|
114 |
+
acceptance of this Agreement or access to the Llama Materials and will continue in
|
115 |
+
full force and effect until terminated in accordance with the terms and conditions
|
116 |
+
herein. Meta may terminate this Agreement if you are in breach of any term or
|
117 |
+
condition of this Agreement. Upon termination of this Agreement, you shall delete
|
118 |
+
and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the
|
119 |
+
termination of this Agreement.
|
120 |
+
|
121 |
+
7. Governing Law and Jurisdiction. This Agreement will be governed and
|
122 |
+
construed under the laws of the State of California without regard to choice of law
|
123 |
+
principles, and the UN Convention on Contracts for the International Sale of Goods
|
124 |
+
does not apply to this Agreement. The courts of California shall have exclusive
|
125 |
+
jurisdiction of any dispute arising out of this Agreement.
|
notice.txt
ADDED
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
1 |
+
Llama 2 is licensed under the LLAMA 2 Community License,
|
2 |
+
Copyright (c) Meta Platforms, Inc. All Rights Reserved.
|
output-00001-of-00004.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c983872289ae2b008c00ae925d9a758e7f75abf6349860080380607656bf01b5
|
3 |
+
size 8530438992
|
output-00002-of-00004.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b9296c042d7612937ceb839f07b9556a1db0072e3161640bbc1748fdc4822243
|
3 |
+
size 8553067696
|
output-00003-of-00004.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c5c079ffb2f1c2e31e564cccaa028dd4a1cd6e267febd332f6c13b0fc956589b
|
3 |
+
size 8535904104
|
output-00004-of-00004.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a331af358061acaa041ea4d191f4b54a23733160c137c56961563fee4817b94b
|
3 |
+
size 812797304
|
pytorch_model.bin.index.json
ADDED
@@ -0,0 +1,730 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"metadata": {
|
3 |
+
"total_size": 137953296384
|
4 |
+
},
|
5 |
+
"weight_map": {
|
6 |
+
"lm_head.weight": "pytorch_model-00015-of-00015.bin",
|
7 |
+
"model.embed_tokens.weight": "pytorch_model-00001-of-00015.bin",
|
8 |
+
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
|
9 |
+
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
|
10 |
+
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
|
11 |
+
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
|
12 |
+
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
|
13 |
+
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
|
14 |
+
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
|
15 |
+
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
|
16 |
+
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
|
17 |
+
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
|
18 |
+
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
|
19 |
+
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
|
20 |
+
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
|
21 |
+
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
|
22 |
+
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
|
23 |
+
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
|
24 |
+
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
|
25 |
+
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
|
26 |
+
"model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
27 |
+
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
|
28 |
+
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
|
29 |
+
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
|
30 |
+
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
31 |
+
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
|
32 |
+
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
|
33 |
+
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
|
34 |
+
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
|
35 |
+
"model.layers.11.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
36 |
+
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
|
37 |
+
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
|
38 |
+
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
|
39 |
+
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
40 |
+
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
|
41 |
+
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
|
42 |
+
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
|
43 |
+
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
|
44 |
+
"model.layers.12.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
45 |
+
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
|
46 |
+
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
|
47 |
+
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
|
48 |
+
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
49 |
+
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
|
50 |
+
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
|
51 |
+
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
|
52 |
+
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
|
53 |
+
"model.layers.13.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
54 |
+
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
|
55 |
+
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
|
56 |
+
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
|
57 |
+
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
58 |
+
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
|
59 |
+
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
|
60 |
+
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
|
61 |
+
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
|
62 |
+
"model.layers.14.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
63 |
+
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
|
64 |
+
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
|
65 |
+
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
|
66 |
+
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
67 |
+
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
|
68 |
+
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
|
69 |
+
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
|
70 |
+
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
|
71 |
+
"model.layers.15.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
72 |
+
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
|
73 |
+
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
|
74 |
+
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
|
75 |
+
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
76 |
+
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
|
77 |
+
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
|
78 |
+
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
|
79 |
+
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
|
80 |
+
"model.layers.16.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
81 |
+
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
|
82 |
+
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
|
83 |
+
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
|
84 |
+
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
|
85 |
+
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
|
86 |
+
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
|
87 |
+
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
|
88 |
+
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
|
89 |
+
"model.layers.17.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
|
90 |
+
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
|
91 |
+
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
|
92 |
+
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
|
93 |
+
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
|
94 |
+
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
|
95 |
+
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
|
96 |
+
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
|
97 |
+
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
|
98 |
+
"model.layers.18.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
|
99 |
+
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
|
100 |
+
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
|
101 |
+
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
|
102 |
+
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
|
103 |
+
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
|
104 |
+
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
|
105 |
+
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
|
106 |
+
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
|
107 |
+
"model.layers.19.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
|
108 |
+
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
|
109 |
+
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
|
110 |
+
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
|
111 |
+
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
|
112 |
+
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
|
113 |
+
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
|
114 |
+
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
|
115 |
+
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
|
116 |
+
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
|
117 |
+
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
|
118 |
+
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
|
119 |
+
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
|
120 |
+
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
|
121 |
+
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
|
122 |
+
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
|
123 |
+
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
|
124 |
+
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
|
125 |
+
"model.layers.20.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
|
126 |
+
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
|
127 |
+
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
|
128 |
+
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
|
129 |
+
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
|
130 |
+
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
|
131 |
+
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
|
132 |
+
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
|
133 |
+
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
|
134 |
+
"model.layers.21.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
|
135 |
+
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
|
136 |
+
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
|
137 |
+
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
|
138 |
+
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
|
139 |
+
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
|
140 |
+
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
|
141 |
+
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
|
142 |
+
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
|
143 |
+
"model.layers.22.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
144 |
+
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
|
145 |
+
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
|
146 |
+
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
|
147 |
+
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
148 |
+
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
|
149 |
+
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
|
150 |
+
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
|
151 |
+
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
|
152 |
+
"model.layers.23.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
153 |
+
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
|
154 |
+
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
|
155 |
+
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
|
156 |
+
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
157 |
+
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
|
158 |
+
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
|
159 |
+
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
|
160 |
+
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
|
161 |
+
"model.layers.24.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
162 |
+
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
|
163 |
+
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
|
164 |
+
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
|
165 |
+
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
166 |
+
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
|
167 |
+
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
|
168 |
+
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
|
169 |
+
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
|
170 |
+
"model.layers.25.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
171 |
+
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
|
172 |
+
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
|
173 |
+
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
|
174 |
+
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
175 |
+
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
|
176 |
+
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
|
177 |
+
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
|
178 |
+
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
|
179 |
+
"model.layers.26.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
180 |
+
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
|
181 |
+
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
|
182 |
+
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
|
183 |
+
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
184 |
+
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
|
185 |
+
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
|
186 |
+
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
|
187 |
+
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
|
188 |
+
"model.layers.27.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
189 |
+
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
|
190 |
+
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
|
191 |
+
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
|
192 |
+
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
|
193 |
+
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
|
194 |
+
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
|
195 |
+
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
|
196 |
+
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
|
197 |
+
"model.layers.28.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
198 |
+
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
|
199 |
+
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
|
200 |
+
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
|
201 |
+
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
202 |
+
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
|
203 |
+
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
|
204 |
+
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
|
205 |
+
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
|
206 |
+
"model.layers.29.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
207 |
+
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
|
208 |
+
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
|
209 |
+
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
|
210 |
+
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
211 |
+
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
|
212 |
+
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
|
213 |
+
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
|
214 |
+
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
|
215 |
+
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
|
216 |
+
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
|
217 |
+
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
|
218 |
+
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
|
219 |
+
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
|
220 |
+
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
|
221 |
+
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
|
222 |
+
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
|
223 |
+
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
|
224 |
+
"model.layers.30.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
225 |
+
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
|
226 |
+
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
|
227 |
+
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
|
228 |
+
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
229 |
+
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
|
230 |
+
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
|
231 |
+
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
|
232 |
+
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
|
233 |
+
"model.layers.31.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
234 |
+
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
|
235 |
+
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
|
236 |
+
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
|
237 |
+
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
238 |
+
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
|
239 |
+
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
|
240 |
+
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
|
241 |
+
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
|
242 |
+
"model.layers.32.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
243 |
+
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
|
244 |
+
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
|
245 |
+
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
|
246 |
+
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
247 |
+
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
|
248 |
+
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
|
249 |
+
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
|
250 |
+
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
|
251 |
+
"model.layers.33.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
252 |
+
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
|
253 |
+
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
|
254 |
+
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
|
255 |
+
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
|
256 |
+
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
|
257 |
+
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
|
258 |
+
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
|
259 |
+
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
|
260 |
+
"model.layers.34.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
261 |
+
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
|
262 |
+
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
|
263 |
+
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
|
264 |
+
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
265 |
+
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
|
266 |
+
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
|
267 |
+
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
|
268 |
+
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
|
269 |
+
"model.layers.35.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
270 |
+
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
|
271 |
+
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
|
272 |
+
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
|
273 |
+
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
274 |
+
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
|
275 |
+
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
|
276 |
+
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
|
277 |
+
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
|
278 |
+
"model.layers.36.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
279 |
+
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
|
280 |
+
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
|
281 |
+
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
|
282 |
+
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
283 |
+
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
|
284 |
+
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
|
285 |
+
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
|
286 |
+
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
|
287 |
+
"model.layers.37.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
288 |
+
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
|
289 |
+
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
|
290 |
+
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
|
291 |
+
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
292 |
+
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
|
293 |
+
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
|
294 |
+
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
|
295 |
+
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
|
296 |
+
"model.layers.38.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
297 |
+
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
|
298 |
+
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
|
299 |
+
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
|
300 |
+
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
301 |
+
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
|
302 |
+
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
|
303 |
+
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
|
304 |
+
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
|
305 |
+
"model.layers.39.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
306 |
+
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
|
307 |
+
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
|
308 |
+
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
|
309 |
+
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
|
310 |
+
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
|
311 |
+
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
|
312 |
+
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
|
313 |
+
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
|
314 |
+
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
|
315 |
+
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
|
316 |
+
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
|
317 |
+
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
|
318 |
+
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
|
319 |
+
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
|
320 |
+
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
|
321 |
+
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
|
322 |
+
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
|
323 |
+
"model.layers.40.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
|
324 |
+
"model.layers.40.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
|
325 |
+
"model.layers.40.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
|
326 |
+
"model.layers.40.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
|
327 |
+
"model.layers.40.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
|
328 |
+
"model.layers.40.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
|
329 |
+
"model.layers.40.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
|
330 |
+
"model.layers.40.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
|
331 |
+
"model.layers.40.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
|
332 |
+
"model.layers.41.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
|
333 |
+
"model.layers.41.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
|
334 |
+
"model.layers.41.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
|
335 |
+
"model.layers.41.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
|
336 |
+
"model.layers.41.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
|
337 |
+
"model.layers.41.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
|
338 |
+
"model.layers.41.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
|
339 |
+
"model.layers.41.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
|
340 |
+
"model.layers.41.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
|
341 |
+
"model.layers.42.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
|
342 |
+
"model.layers.42.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
|
343 |
+
"model.layers.42.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
|
344 |
+
"model.layers.42.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
|
345 |
+
"model.layers.42.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
|
346 |
+
"model.layers.42.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
|
347 |
+
"model.layers.42.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
|
348 |
+
"model.layers.42.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
|
349 |
+
"model.layers.42.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
|
350 |
+
"model.layers.43.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
|
351 |
+
"model.layers.43.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
|
352 |
+
"model.layers.43.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
|
353 |
+
"model.layers.43.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
|
354 |
+
"model.layers.43.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
|
355 |
+
"model.layers.43.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
|
356 |
+
"model.layers.43.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
|
357 |
+
"model.layers.43.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
|
358 |
+
"model.layers.43.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
|
359 |
+
"model.layers.44.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
|
360 |
+
"model.layers.44.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
|
361 |
+
"model.layers.44.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
|
362 |
+
"model.layers.44.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
|
363 |
+
"model.layers.44.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
|
364 |
+
"model.layers.44.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
|
365 |
+
"model.layers.44.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
|
366 |
+
"model.layers.44.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
|
367 |
+
"model.layers.44.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
|
368 |
+
"model.layers.45.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
369 |
+
"model.layers.45.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
|
370 |
+
"model.layers.45.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
|
371 |
+
"model.layers.45.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
|
372 |
+
"model.layers.45.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
373 |
+
"model.layers.45.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
|
374 |
+
"model.layers.45.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
|
375 |
+
"model.layers.45.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
|
376 |
+
"model.layers.45.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
|
377 |
+
"model.layers.46.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
378 |
+
"model.layers.46.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
|
379 |
+
"model.layers.46.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
|
380 |
+
"model.layers.46.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
|
381 |
+
"model.layers.46.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
382 |
+
"model.layers.46.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
|
383 |
+
"model.layers.46.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
|
384 |
+
"model.layers.46.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
|
385 |
+
"model.layers.46.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
|
386 |
+
"model.layers.47.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
387 |
+
"model.layers.47.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
|
388 |
+
"model.layers.47.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
|
389 |
+
"model.layers.47.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
|
390 |
+
"model.layers.47.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
391 |
+
"model.layers.47.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
|
392 |
+
"model.layers.47.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
|
393 |
+
"model.layers.47.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
|
394 |
+
"model.layers.47.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
|
395 |
+
"model.layers.48.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
396 |
+
"model.layers.48.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
|
397 |
+
"model.layers.48.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
|
398 |
+
"model.layers.48.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
|
399 |
+
"model.layers.48.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
400 |
+
"model.layers.48.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
|
401 |
+
"model.layers.48.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
|
402 |
+
"model.layers.48.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
|
403 |
+
"model.layers.48.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
|
404 |
+
"model.layers.49.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
405 |
+
"model.layers.49.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
|
406 |
+
"model.layers.49.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
|
407 |
+
"model.layers.49.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
|
408 |
+
"model.layers.49.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
409 |
+
"model.layers.49.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
|
410 |
+
"model.layers.49.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
|
411 |
+
"model.layers.49.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
|
412 |
+
"model.layers.49.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
|
413 |
+
"model.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
414 |
+
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
|
415 |
+
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
|
416 |
+
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
|
417 |
+
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
418 |
+
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
|
419 |
+
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
|
420 |
+
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
|
421 |
+
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
|
422 |
+
"model.layers.50.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
423 |
+
"model.layers.50.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
|
424 |
+
"model.layers.50.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
|
425 |
+
"model.layers.50.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
|
426 |
+
"model.layers.50.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
|
427 |
+
"model.layers.50.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
|
428 |
+
"model.layers.50.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
|
429 |
+
"model.layers.50.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
|
430 |
+
"model.layers.50.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
|
431 |
+
"model.layers.51.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
432 |
+
"model.layers.51.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
|
433 |
+
"model.layers.51.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
|
434 |
+
"model.layers.51.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
|
435 |
+
"model.layers.51.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
436 |
+
"model.layers.51.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
|
437 |
+
"model.layers.51.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
|
438 |
+
"model.layers.51.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
|
439 |
+
"model.layers.51.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
|
440 |
+
"model.layers.52.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
441 |
+
"model.layers.52.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
|
442 |
+
"model.layers.52.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
|
443 |
+
"model.layers.52.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
|
444 |
+
"model.layers.52.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
445 |
+
"model.layers.52.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
|
446 |
+
"model.layers.52.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
|
447 |
+
"model.layers.52.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
|
448 |
+
"model.layers.52.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
|
449 |
+
"model.layers.53.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
450 |
+
"model.layers.53.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
|
451 |
+
"model.layers.53.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
|
452 |
+
"model.layers.53.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
|
453 |
+
"model.layers.53.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
454 |
+
"model.layers.53.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
|
455 |
+
"model.layers.53.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
|
456 |
+
"model.layers.53.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
|
457 |
+
"model.layers.53.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
|
458 |
+
"model.layers.54.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
459 |
+
"model.layers.54.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
|
460 |
+
"model.layers.54.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
|
461 |
+
"model.layers.54.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
|
462 |
+
"model.layers.54.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
463 |
+
"model.layers.54.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
|
464 |
+
"model.layers.54.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
|
465 |
+
"model.layers.54.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
|
466 |
+
"model.layers.54.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
|
467 |
+
"model.layers.55.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
468 |
+
"model.layers.55.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
|
469 |
+
"model.layers.55.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
|
470 |
+
"model.layers.55.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
|
471 |
+
"model.layers.55.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
472 |
+
"model.layers.55.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
|
473 |
+
"model.layers.55.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
|
474 |
+
"model.layers.55.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
|
475 |
+
"model.layers.55.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
|
476 |
+
"model.layers.56.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
477 |
+
"model.layers.56.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
|
478 |
+
"model.layers.56.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
|
479 |
+
"model.layers.56.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
|
480 |
+
"model.layers.56.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
|
481 |
+
"model.layers.56.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
|
482 |
+
"model.layers.56.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
|
483 |
+
"model.layers.56.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
|
484 |
+
"model.layers.56.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
|
485 |
+
"model.layers.57.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
486 |
+
"model.layers.57.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
|
487 |
+
"model.layers.57.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
|
488 |
+
"model.layers.57.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
|
489 |
+
"model.layers.57.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
490 |
+
"model.layers.57.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
|
491 |
+
"model.layers.57.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
|
492 |
+
"model.layers.57.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
|
493 |
+
"model.layers.57.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
|
494 |
+
"model.layers.58.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
495 |
+
"model.layers.58.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
|
496 |
+
"model.layers.58.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
|
497 |
+
"model.layers.58.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
|
498 |
+
"model.layers.58.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
499 |
+
"model.layers.58.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
|
500 |
+
"model.layers.58.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
|
501 |
+
"model.layers.58.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
|
502 |
+
"model.layers.58.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
|
503 |
+
"model.layers.59.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
504 |
+
"model.layers.59.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
|
505 |
+
"model.layers.59.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
|
506 |
+
"model.layers.59.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
|
507 |
+
"model.layers.59.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
508 |
+
"model.layers.59.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
|
509 |
+
"model.layers.59.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
|
510 |
+
"model.layers.59.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
|
511 |
+
"model.layers.59.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
|
512 |
+
"model.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
513 |
+
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
|
514 |
+
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
|
515 |
+
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
|
516 |
+
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
517 |
+
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
|
518 |
+
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
|
519 |
+
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
|
520 |
+
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
|
521 |
+
"model.layers.60.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
522 |
+
"model.layers.60.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
|
523 |
+
"model.layers.60.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
|
524 |
+
"model.layers.60.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
|
525 |
+
"model.layers.60.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
526 |
+
"model.layers.60.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
|
527 |
+
"model.layers.60.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
|
528 |
+
"model.layers.60.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
|
529 |
+
"model.layers.60.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
|
530 |
+
"model.layers.61.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
531 |
+
"model.layers.61.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
|
532 |
+
"model.layers.61.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
|
533 |
+
"model.layers.61.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
|
534 |
+
"model.layers.61.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
535 |
+
"model.layers.61.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
|
536 |
+
"model.layers.61.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
|
537 |
+
"model.layers.61.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
|
538 |
+
"model.layers.61.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
|
539 |
+
"model.layers.62.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
540 |
+
"model.layers.62.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
|
541 |
+
"model.layers.62.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
|
542 |
+
"model.layers.62.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
|
543 |
+
"model.layers.62.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
|
544 |
+
"model.layers.62.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
|
545 |
+
"model.layers.62.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
|
546 |
+
"model.layers.62.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
|
547 |
+
"model.layers.62.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
|
548 |
+
"model.layers.63.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
|
549 |
+
"model.layers.63.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
|
550 |
+
"model.layers.63.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
|
551 |
+
"model.layers.63.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
|
552 |
+
"model.layers.63.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
|
553 |
+
"model.layers.63.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
|
554 |
+
"model.layers.63.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
|
555 |
+
"model.layers.63.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
|
556 |
+
"model.layers.63.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
|
557 |
+
"model.layers.64.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
|
558 |
+
"model.layers.64.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
|
559 |
+
"model.layers.64.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
|
560 |
+
"model.layers.64.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
|
561 |
+
"model.layers.64.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
|
562 |
+
"model.layers.64.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
|
563 |
+
"model.layers.64.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
|
564 |
+
"model.layers.64.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
|
565 |
+
"model.layers.64.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
|
566 |
+
"model.layers.65.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
|
567 |
+
"model.layers.65.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
|
568 |
+
"model.layers.65.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
|
569 |
+
"model.layers.65.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
|
570 |
+
"model.layers.65.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
|
571 |
+
"model.layers.65.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
|
572 |
+
"model.layers.65.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
|
573 |
+
"model.layers.65.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
|
574 |
+
"model.layers.65.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
|
575 |
+
"model.layers.66.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
|
576 |
+
"model.layers.66.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
|
577 |
+
"model.layers.66.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
|
578 |
+
"model.layers.66.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
|
579 |
+
"model.layers.66.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
|
580 |
+
"model.layers.66.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
|
581 |
+
"model.layers.66.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
|
582 |
+
"model.layers.66.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
|
583 |
+
"model.layers.66.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
|
584 |
+
"model.layers.67.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
|
585 |
+
"model.layers.67.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
|
586 |
+
"model.layers.67.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
|
587 |
+
"model.layers.67.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
|
588 |
+
"model.layers.67.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
|
589 |
+
"model.layers.67.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
|
590 |
+
"model.layers.67.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
|
591 |
+
"model.layers.67.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
|
592 |
+
"model.layers.67.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
|
593 |
+
"model.layers.68.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
594 |
+
"model.layers.68.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
|
595 |
+
"model.layers.68.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
|
596 |
+
"model.layers.68.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
|
597 |
+
"model.layers.68.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
598 |
+
"model.layers.68.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
|
599 |
+
"model.layers.68.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
|
600 |
+
"model.layers.68.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
|
601 |
+
"model.layers.68.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
|
602 |
+
"model.layers.69.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
603 |
+
"model.layers.69.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
|
604 |
+
"model.layers.69.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
|
605 |
+
"model.layers.69.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
|
606 |
+
"model.layers.69.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
607 |
+
"model.layers.69.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
|
608 |
+
"model.layers.69.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
|
609 |
+
"model.layers.69.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
|
610 |
+
"model.layers.69.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
|
611 |
+
"model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
612 |
+
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
|
613 |
+
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
|
614 |
+
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
|
615 |
+
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
616 |
+
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
|
617 |
+
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
|
618 |
+
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
|
619 |
+
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
|
620 |
+
"model.layers.70.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
621 |
+
"model.layers.70.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
|
622 |
+
"model.layers.70.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
|
623 |
+
"model.layers.70.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
|
624 |
+
"model.layers.70.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
625 |
+
"model.layers.70.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
|
626 |
+
"model.layers.70.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
|
627 |
+
"model.layers.70.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
|
628 |
+
"model.layers.70.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
|
629 |
+
"model.layers.71.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
630 |
+
"model.layers.71.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
|
631 |
+
"model.layers.71.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
|
632 |
+
"model.layers.71.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
|
633 |
+
"model.layers.71.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
634 |
+
"model.layers.71.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
|
635 |
+
"model.layers.71.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
|
636 |
+
"model.layers.71.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
|
637 |
+
"model.layers.71.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
|
638 |
+
"model.layers.72.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
639 |
+
"model.layers.72.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
|
640 |
+
"model.layers.72.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
|
641 |
+
"model.layers.72.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
|
642 |
+
"model.layers.72.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
643 |
+
"model.layers.72.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
|
644 |
+
"model.layers.72.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
|
645 |
+
"model.layers.72.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
|
646 |
+
"model.layers.72.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
|
647 |
+
"model.layers.73.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
648 |
+
"model.layers.73.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
|
649 |
+
"model.layers.73.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
|
650 |
+
"model.layers.73.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
|
651 |
+
"model.layers.73.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
|
652 |
+
"model.layers.73.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
|
653 |
+
"model.layers.73.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
|
654 |
+
"model.layers.73.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
|
655 |
+
"model.layers.73.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
|
656 |
+
"model.layers.74.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
657 |
+
"model.layers.74.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
|
658 |
+
"model.layers.74.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
|
659 |
+
"model.layers.74.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
|
660 |
+
"model.layers.74.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
661 |
+
"model.layers.74.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
|
662 |
+
"model.layers.74.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
|
663 |
+
"model.layers.74.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
|
664 |
+
"model.layers.74.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
|
665 |
+
"model.layers.75.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
666 |
+
"model.layers.75.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
|
667 |
+
"model.layers.75.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
|
668 |
+
"model.layers.75.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
|
669 |
+
"model.layers.75.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
670 |
+
"model.layers.75.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
|
671 |
+
"model.layers.75.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
|
672 |
+
"model.layers.75.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
|
673 |
+
"model.layers.75.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
|
674 |
+
"model.layers.76.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
675 |
+
"model.layers.76.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
|
676 |
+
"model.layers.76.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
|
677 |
+
"model.layers.76.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
|
678 |
+
"model.layers.76.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
679 |
+
"model.layers.76.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
|
680 |
+
"model.layers.76.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
|
681 |
+
"model.layers.76.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
|
682 |
+
"model.layers.76.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
|
683 |
+
"model.layers.77.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
684 |
+
"model.layers.77.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
|
685 |
+
"model.layers.77.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
|
686 |
+
"model.layers.77.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
|
687 |
+
"model.layers.77.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
688 |
+
"model.layers.77.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
|
689 |
+
"model.layers.77.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
|
690 |
+
"model.layers.77.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
|
691 |
+
"model.layers.77.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
|
692 |
+
"model.layers.78.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
693 |
+
"model.layers.78.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
|
694 |
+
"model.layers.78.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
|
695 |
+
"model.layers.78.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
|
696 |
+
"model.layers.78.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
697 |
+
"model.layers.78.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
|
698 |
+
"model.layers.78.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
|
699 |
+
"model.layers.78.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
|
700 |
+
"model.layers.78.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
|
701 |
+
"model.layers.79.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
702 |
+
"model.layers.79.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
|
703 |
+
"model.layers.79.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
|
704 |
+
"model.layers.79.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
|
705 |
+
"model.layers.79.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
|
706 |
+
"model.layers.79.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
|
707 |
+
"model.layers.79.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
|
708 |
+
"model.layers.79.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
|
709 |
+
"model.layers.79.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
|
710 |
+
"model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
711 |
+
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
|
712 |
+
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
|
713 |
+
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
|
714 |
+
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
715 |
+
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
|
716 |
+
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
|
717 |
+
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
|
718 |
+
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
|
719 |
+
"model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
720 |
+
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
|
721 |
+
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
|
722 |
+
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
|
723 |
+
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
|
724 |
+
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
|
725 |
+
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
|
726 |
+
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
|
727 |
+
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
|
728 |
+
"model.norm.weight": "pytorch_model-00014-of-00015.bin"
|
729 |
+
}
|
730 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"bos_token": {"content": "<s>", "lstrip": false, "normalized": true, "rstrip": false, "single_word": false}, "eos_token": {"content": "</s>", "lstrip": false, "normalized": true, "rstrip": false, "single_word": false}, "unk_token": {"content": "<unk>", "lstrip": false, "normalized": true, "rstrip": false, "single_word": false}}
|
tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
|
3 |
+
size 499723
|
tokenizer_config.json
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token":true,
|
3 |
+
"add_eos_token":false,
|
4 |
+
"model_max_length":2048,
|
5 |
+
"pad_token":null,
|
6 |
+
"sp_model_kwargs":{
|
7 |
+
|
8 |
+
},
|
9 |
+
"tokenizer_class":"LlamaTokenizer",
|
10 |
+
"clean_up_tokenization_spaces":false,
|
11 |
+
"bos_token":{
|
12 |
+
"__type":"AddedToken",
|
13 |
+
"content":"<s>",
|
14 |
+
"lstrip":false,
|
15 |
+
"normalized":true,
|
16 |
+
"rstrip":false,
|
17 |
+
"single_word":false
|
18 |
+
},
|
19 |
+
"eos_token":{
|
20 |
+
"__type":"AddedToken",
|
21 |
+
"content":"</s>",
|
22 |
+
"lstrip":false,
|
23 |
+
"normalized":true,
|
24 |
+
"rstrip":false,
|
25 |
+
"single_word":false
|
26 |
+
},
|
27 |
+
"unk_token":{
|
28 |
+
"__type":"AddedToken",
|
29 |
+
"content":"<unk>",
|
30 |
+
"lstrip":false,
|
31 |
+
"normalized":true,
|
32 |
+
"rstrip":false,
|
33 |
+
"single_word":false
|
34 |
+
}
|
35 |
+
}
|