pom commited on
Commit
33e5a3a
1 Parent(s): 4f64cdc

update XVERSE-13B-Chat model

Browse files
MODEL_LICENSE.pdf CHANGED
Binary files a/MODEL_LICENSE.pdf and b/MODEL_LICENSE.pdf differ
 
README.md CHANGED
@@ -14,8 +14,8 @@ inference: false
14
  **XVERSE-13B** 是由深圳元象科技自主研发的支持多语言的大语言模型(Large Language Model),主要特点如下:
15
 
16
  - **模型结构**:XVERSE-13B 使用主流 Decoder-only 的标准 Transformer 网络结构,支持 8K 的上下文长度(Context Length),为同尺寸模型中最长,能满足更长的多轮对话、知识问答与摘要等需求,模型应用场景更广泛。
17
- - **训练数据**:构建了 1.4 万亿 token 的高质量、多样化的数据对模型进行充分训练,包含中、英、俄、西等 40 多种语言,通过精细化设置不同类型数据的采样比例,使得中英两种语言表现优异,也能兼顾其他语言效果。
18
- - **分词**:基于 BPE(Byte-Pair Encoding)算法,使用上百 GB 语料训练了一个词表大小为 100,278 的分词器,能够同时支持多语言,而无需额外扩展词表。
19
  - **训练框架**:自主研发多项关键技术,包括高效算子、显存优化、并行调度策略、数据-计算-通信重叠、平台和框架协同等,让训练效率更高,模型稳定性强,在千卡集群上的峰值算力利用率可达到 58.5%,位居业界前列。
20
 
21
  ## Model Introduction
@@ -25,113 +25,55 @@ inference: false
25
  **XVERSE-13B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Its key features are as follows:
26
 
27
  - **Model Structure**: XVERSE-13B uses the mainstream Decoder-only Transformer network structure, supports 8k context length, the longest one among models of the same size, which can meet the need of longer multi-round dialogues, knowledge question-answering, and summarization. This makes the model more versatile in application scenarios.
28
- - **Training Data**: The model has been thoroughly trained on a diversified and high-quality dataset consisting of 1.4 trillion of tokens, including more than 40 languages such as Chinese, English, Russian, and Spanish. The sampling ratio of different types of data is finely set, which makes the performance of Chinese and English excellent, and also takes into account the effect of other languages.
29
- - **Tokenization**: Based on the BPE (Byte-Pair Encoding) algorithm, a tokenizer with a vocabulary size of 100,278 has been trained using hundreds of gigabytes of language data. This tokenizer is capable of supporting multilingual without the need for additional vocabulary expansion.
30
  - **Training Framework**: Several key technologies have also been independently developed, including efficient operators, memory optimization, parallel scheduling strategies, overlap of data-computation-communication, and synergy between platforms and frameworks. These advancements enhance training efficiency and model stability. With these technologies, the peak computational power utilization rate on a thousand-card cluster can reach 58.5%, ranking at the forefront of the industry.
31
 
32
  ## 评测结果
33
 
34
- 为验证模型的各项能力,我们选取了多个学科综合能力评测集,包括 [MMLU](https://arxiv.org/abs/2009.03300)(英文)、 [C-Eval](https://cevalbenchmark.com/)(中文)、[AGIEval](https://arxiv.org/abs/2304.06364)(中英) [GAOKAO-Bench](https://github.com/OpenLMLab/GAOKAO-Bench)(中英)、[GAOKAO-English](https://github.com/ExpressAI/AI-Gaokao)(英文),评测结果如下:
35
-
36
- | 模型 | 类型 | MMLU | C-Eval | AGIEval<sup>1</sup> | GAOKAO-Bench<sup>1</sup> | GAOKAO-English<sup>1</sup> |
37
- | :------------------------: | :--------------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
38
- | Baichuan-13B | 底座 | 51.6<sup>2</sup> | 53.6<sup>3</sup> | 40.5 | 45.9 | 56.9 |
39
- | Baichuan-13B-Chat | 对话 | 52.1<sup>2</sup> | 51.5<sup>2</sup> | 34.6 | 46.7 | 63.8 |
40
- | Chinese-Alpaca-2-13B | 对话 | 53.2 | 41.3 | 36.6 | 38.4 | 65.1 |
41
- | Llama-1-13B | 底座 | 46.9<sup>4</sup> | 28.8 | 27.3 | 26.4 | 38.1 |
42
- | Llama-2-13B | 底座 | 54.8<sup>4</sup> | 35.6 | 33.4 | 35.4 | 60.6 |
43
- | moss-moon-003-base (16B) | 底座 | 24.7 | 33.1<sup>3</sup> | 26.8 | 28.5 | 34.7 |
44
- | moss-moon-003-sft (16B) | 对话 | 25.5 | 33.6 | 27.6 | 28.8 | 29.2 |
45
- | OpenLLaMA-13B | 底座 | 42.4 | 24.7 | 24.0 | 25.6 | 33.3 |
46
- | OPT-13B | 底座 | 25.2 | 25.0 | 24.2 | 24.4 | 31.1 |
47
- | Pythia-12B | 底座 | 25.1 | 26.2 | 25.3 | 25.3 | 26.8 |
48
- | Vicuna-13B-v1.5 | 对话 | 53.5 | 27.9 | 29.7 | 31.6 | 52.9 |
49
- | Ziya-LLaMA-13B-Pretrain-v1| 底座 | 43.9 | 30.2 | 27.2 | 26.4 | 37.6 |
50
- | Ziya-LLaMA-13B-v1.1 | 对话 | 50.6 | 29.3 | 23.6 | 26.7 | 27.3 |
51
- | **XVERSE-13B** | 底座 | **55.1** | **54.7** | **41.4** | **53.9** | **66.5** |
52
- | **XVERSE-13B-Chat** | 对话 | **60.2** | **53.1** | **48.3** | **50.7** | **80.6** |
53
 
54
  > <sup>1:只针对其中的单项选择题进行测试,即排除了填空题、开放性问题和多项选择题</sup>
55
- > <sup>2:来源于 [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 的汇报结果</sup>
56
- > <sup>3:来源于 [C-Eval](https://cevalbenchmark.com/) 的汇报结果</sup>
57
- > <sup>4:来源于[Llama 2 论文](https://arxiv.org/abs/2307.09288)的汇报结果</sup>
58
- >
59
- > 对于 MMLU ,我们采用作者提供的[评测工具](https://github.com/hendrycks/test),C-Eval、AGIEval、GAOKAO-Bench、GAOKAO-English 与 MMLU 的评测方式相同,且统一采用 **5-shot** 构造测试样本。
60
-
61
- ## Model Evaluation
62
-
63
- In order to validate the various abilities of the model, we have chosen several comprehensive capability benchmarks across multiple disciplines, including [MMLU](https://arxiv.org/abs/2009.03300) (English), [C-Eval](https://cevalbenchmark.com/) (Chinese), [AGIEval](https://arxiv.org/abs/2304.06364) (Chinese and English), [GAOKAO-Bench](https://github.com/OpenLMLab/GAOKAO-Bench) (Chinese and English), [GAOKAO-English](https://github.com/ExpressAI/AI-Gaokao) (English), the evaluation results are as follows:
64
 
 
 
65
 
66
- | Models | Type | MMLU | C-Eval | AGIEval<sup>1</sup> | GAOKAO-Bench<sup>1</sup> | GAOKAO-English<sup>1</sup> |
67
- | :------------------------: | :--------------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
68
- | Baichuan-13B | pretrained | 51.6<sup>2</sup> | 53.6<sup>3</sup> | 40.5 | 45.9 | 56.9 |
69
- | Baichuan-13B-Chat | fine-tuned | 52.1<sup>2</sup> | 51.5<sup>2</sup> | 34.6 | 46.7 | 63.8 |
70
- | Chinese-Alpaca-2-13B | fine-tuned | 53.2 | 41.3 | 36.6 | 38.4 | 65.1 |
71
- | Llama-1-13B | pretrained | 46.9<sup>4</sup> | 28.8 | 27.3 | 26.4 | 38.1 |
72
- | Llama-2-13B | pretrained | 54.8<sup>4</sup> | 35.6 | 33.4 | 35.4 | 60.6 |
73
- | moss-moon-003-base (16B) | pretrained | 24.7 | 33.1<sup>3</sup> | 26.8 | 28.5 | 34.7 |
74
- | moss-moon-003-sft (16B) | fine-tuned | 25.5 | 33.6 | 27.6 | 28.8 | 29.2 |
75
- | OpenLLaMA-13B | pretrained | 42.4 | 24.7 | 24.0 | 25.6 | 33.3 |
76
- | OPT-13B | pretrained | 25.2 | 25.0 | 24.2 | 24.4 | 31.1 |
77
- | Pythia-12B | pretrained | 25.1 | 26.2 | 25.3 | 25.3 | 26.8 |
78
- | Vicuna-13B-v1.5 | fine-tuned | 53.5 | 27.9 | 29.7 | 31.6 | 52.9 |
79
- | Ziya-LLaMA-13B-Pretrain-v1| pretrained | 43.9 | 30.2 | 27.2 | 26.4 | 37.6 |
80
- | Ziya-LLaMA-13B-v1.1 | fine-tuned | 50.6 | 29.3 | 23.6 | 26.7 | 27.3 |
81
- | **XVERSE-13B** | pretrained | **55.1** | **54.7** | **41.4** | **53.9** | **66.5** |
82
- | **XVERSE-13B-Chat** | fine-tuned | **60.2** | **53.1** | **48.3** | **50.7** | **80.6** |
83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
  > <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
86
- > <sup>2: Reporting results from [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B).</sup>
87
- > <sup>3: Reporting results from [C-Eval](https://cevalbenchmark.com/).</sup>
88
- > <sup>4: Reporting results from [Llama 2](https://arxiv.org/abs/2307.09288).</sup>
89
- >
90
- > For MMLU, we adopt the [evaluation tools](https://github.com/hendrycks/test) provided by the authors, C-Eval, AGIEval, GAOKAO-Bench, GAOKAO-English are the same as MMLU, and uniformly use **5-shot** to construct the test samples.
91
-
92
- ### MMLU 各类别指标
93
-
94
- MMLU Category Results
95
-
96
- | Models | Type | Average | STEM | Social Science | Humanities | Others |
97
- | :------------------------: | :------------------------: | :------: | :------: | :------------: | :--------: | :------: |
98
- | Baichuan-13B | pretrained | 51.6 | 41.6 | 60.9 | 47.4 | 58.5 |
99
- | Baichuan-13B-Chat | fine-tuned | 52.1 | 40.9 | 60.9 | 48.8 | 59.0 |
100
- | Chinese-Alpaca-2-13B | fine-tuned | 53.2 | 41.8 | 61.2 | 51.3 | 59.2 |
101
- | Llama-1-13B | pretrained | 46.9 | 35.8 | 53.8 | 45.0 | 53.3 |
102
- | Llama-2-13B | pretrained | 54.8 | 44.1 | 62.6 | 52.8 | 61.1 |
103
- | moss-moon-003-base (16B) | pretrained | 24.7 | 23.0 | 24.0 | 25.2 | 26.3 |
104
- | moss-moon-003-sft (16B) | fine-tuned | 25.5 | 25.9 | 23.8 | 27.1 | 24.4 |
105
- | OpenLLaMA-13B | pretrained | 42.4 | 34.7 | 48.6 | 40.0 | 47.1 |
106
- | OPT-13B | pretrained | 25.2 | 23.9 | 24.1 | 25.9 | 26.3 |
107
- | Pythia-12B | pretrained | 25.1 | 24.8 | 23.0 | 26.1 | 26.0 |
108
- | Vicuna-13B-v1.5 | fine-tuned | 53.5 | 42.3 | 61.3 | 50.3 | 60.9 |
109
- | Ziya-LLaMA-13B-Pretrain-v1 | pretrained | 43.9 | 36.3 | 48.8 | 41.1 | 50.3 |
110
- | Ziya-LLaMA-13B-v1.1 | fine-tuned | 50.6 | 40.7 | 57.8 | 48.1 | 56.7 |
111
- | **XVERSE-13B** | pretrained | **55.1** | **44.5** | **64.4** | **50.5** | **62.9** |
112
- | **XVERSE-13B-Chat** | fine-tuned | **60.2** | **48.1** | **67.7** | **56.4** | **68.0** |
113
-
114
- ### C-Eval 各类别指标
115
-
116
- C-Eval Category Results
117
-
118
- | Models | Type | Average | STEM | Social Science | Humanities | Others |
119
- | :------------------------: | :------------------------: | :------: | :------: | :------------: | :--------: | :------: |
120
- | Baichuan-13B | pretrained | 53.6 | 47.0 | 66.8 | 57.3 | 49.8 |
121
- | Baichuan-13B-Chat | fine-tuned | 51.5 | 43.7 | 64.6 | 56.2 | 49.2 |
122
- | Chinese-Alpaca-2-13B | fine-tuned | 41.3 | 37.8 | 51.1 | 42.4 | 37.8 |
123
- | Llama-1-13B | pretrained | 28.8 | 27.5 | 33.9 | 27.7 | 27.7 |
124
- | Llama-2-13B | pretrained | 35.6 | 34.5 | 39.8 | 36.2 | 33.2 |
125
- | moss-moon-003-base (16B) | pretrained | 33.1 | 31.6 | 37.0 | 33.4 | 32.1 |
126
- | moss-moon-003-sft (16B) | fine-tuned | 33.6 | 31.4 | 38.6 | 33.8 | 32.9 |
127
- | OpenLLaMA-13B | pretrained | 24.7 | 25.5 | 23.5 | 24.2 | 24.7 |
128
- | OPT-13B | pretrained | 25.0 | 24.4 | 24.6 | 25.9 | 25.4 |
129
- | Pythia-12B | pretrained | 26.2 | 26.8 | 25.1 | 26.7 | 25.4 |
130
- | Vicuna-13B-v1.5 | fine-tuned | 27.9 | 25.4 | 33.2 | 29.3 | 26.2 |
131
- | Ziya-LLaMA-13B-Pretrain-v1 | pretrained | 30.2 | 27.8 | 34.3 | 32.0 | 29.0 |
132
- | Ziya-LLaMA-13B-v1.1 | fine-tuned | 29.3 | 27.5 | 32.8 | 29.7 | 29.0 |
133
- | **XVERSE-13B** | pretrained | **54.7** | **45.6** | **66.2** | **58.3** | **56.9** |
134
- | **XVERSE-13B-Chat** | fine-tuned | **53.1** | **44.5** | **65.3** | **56.5** | **54.3** |
135
 
136
  ### Loading with Transformers
137
 
 
14
  **XVERSE-13B** 是由深圳元象科技自主研发的支持多语言的大语言模型(Large Language Model),主要特点如下:
15
 
16
  - **模型结构**:XVERSE-13B 使用主流 Decoder-only 的标准 Transformer 网络结构,支持 8K 的上下文长度(Context Length),为同尺寸模型中最长,能满足更长的多轮对话、知识问答与摘要等需求,模型应用场景更广泛。
17
+ - **训练数据**:构建了 3.2 万亿 token 的高质量、多样化的数据对模型进行充分训练,包含中、英、俄、西等 40 多种语言,通过精细化设置不同类型数据的采样比例,使得中英两种语言表现优异,也能兼顾其他语言效果。
18
+ - **分词**:基于 BPE(Byte-Pair Encoding)算法,使用上百 GB 语料训练了一个词表大小为 100,534 的分词器,能够同时支持多语言,而无需额外扩展词表。
19
  - **训练框架**:自主研发多项关键技术,包括高效算子、显存优化、并行调度策略、数据-计算-通信重叠、平台和框架协同等,让训练效率更高,模型稳定性强,在千卡集群上的峰值算力利用率可达到 58.5%,位居业界前列。
20
 
21
  ## Model Introduction
 
25
  **XVERSE-13B** is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Its key features are as follows:
26
 
27
  - **Model Structure**: XVERSE-13B uses the mainstream Decoder-only Transformer network structure, supports 8k context length, the longest one among models of the same size, which can meet the need of longer multi-round dialogues, knowledge question-answering, and summarization. This makes the model more versatile in application scenarios.
28
+ - **Training Data**: The model has been thoroughly trained on a diversified and high-quality dataset consisting of 3.2 trillion of tokens, including more than 40 languages such as Chinese, English, Russian, and Spanish. The sampling ratio of different types of data is finely set, which makes the performance of Chinese and English excellent, and also takes into account the effect of other languages.
29
+ - **Tokenization**: Based on the BPE (Byte-Pair Encoding) algorithm, a tokenizer with a vocabulary size of 100,534 has been trained using hundreds of gigabytes of language data. This tokenizer is capable of supporting multilingual without the need for additional vocabulary expansion.
30
  - **Training Framework**: Several key technologies have also been independently developed, including efficient operators, memory optimization, parallel scheduling strategies, overlap of data-computation-communication, and synergy between platforms and frameworks. These advancements enhance training efficiency and model stability. With these technologies, the peak computational power utilization rate on a thousand-card cluster can reach 58.5%, ranking at the forefront of the industry.
31
 
32
  ## 评测结果
33
 
34
+ 为了综合评估模型的性能,我们在一系列标准数据集上进行了全面测试,包括C-Eval、CMMLU、Gaokao-Bench、MMLU、GAOKAO-English、AGIEval、RACE-M、CommonSenseQA、PIQA、GSM8K和HumanEval。这些评估覆盖了模型在多个领域的能力,具体包括中文问答、英文问答、语言理解、常识问答、逻辑推理、数学问题解答以及编程能力。评估结果如下:
35
+
36
+ | 能力维度 | 数据集 | | XVERSE-13B-2 | XVERSE-13B | Baichuan2-13B | Llama1-13B | Llama2-13B |
37
+ | :--------: | :------------------------: | :----: | :----------: | :--------: | :-----------: | :--------: | :--------: |
38
+ | 中文问答 | C-Eval | 5-shot | 63.5 | 54.7 | 58.1 | 28.8 | 35.6 |
39
+ | | CMMLU | 5-shot | 66.2 | 59.1 | 62.0 | 31.5 | 38.4 |
40
+ | | Gaokao-Bench<sup>1</sup> | 5-shot | 67.5 | 53.9 | 54.3 | 26.4 | 35.4 |
41
+ | 英文问答 | MMLU | 5-shot | 61.2 | 55.1 | 59.2 | 46.9 | 54.8 |
42
+ | | GAOKAO-English<sup>1</sup> | 5-shot | 73.7 | 66.5 | 67.7 | 38.1 | 60.6 |
43
+ | 中英文问答 | AGIEval<sup>1</sup> | 5-shot | 54.5 | 41.4 | 48.2 | 27.3 | 33.4 |
44
+ | 语言理解 | RACE-M | 0-shot | 84.6 | 74.2 | 68.9 | 61.6 | 63.0 |
45
+ | 常识问答 | CommonSenseQA | 7-shot | 74.0 | 69.5 | 65.6 | 62.0 | 67.3 |
46
+ | 推理 | PIQA | 0-shot | 80.8 | 79.0 | 78.5 | 80.1 | 80.5 |
47
+ | 数学 | GSM8K | 4-shot | 54.9 | 18.4 | 52.7 | 17.8 | 28.7 |
48
+ | 代码 | HumanEval | 0-shot | 39.6 | 15.9 | 17.1 | 15.8 | 18.3 |
 
 
 
 
49
 
50
  > <sup>1:只针对其中的单项选择题进行测试,即排除了填空题、开放性问题和多项选择题</sup>
 
 
 
 
 
 
 
 
 
51
 
52
+ 对于上述所有比较模型,我们优先汇报其官方公布的结果。在缺少官方结果的情况下,我们采用了 [OpenCompass 榜单](https://opencompass.org.cn/leaderboard-llm)的报告结果。其他结果则来自于我们自行执行的评估流程所获得的数据。
53
+ 对于 MMLU ,我们采用作者提供的[评测工具](https://github.com/hendrycks/test),C-Eval、AGIEval、GAOKAO-Bench、GAOKAO-English 与 MMLU 的评测方式相同,其余评测数据集使用 [OpenCompass 评估框架](https://github.com/open-compass/OpenCompass/)进行评估。
54
 
55
+ ## Model Evaluation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
+ To comprehensively assess the performance of the model, we conducted extensive testing across a range of standard datasets, including C-Eval, CMMLU, Gaokao-Bench, MMLU, GAOKAO-English, AGIEval, RACE-M, CommonSenseQA, PIQA, GSM8K and HumanEval. These evaluations spanned multiple capabilities of the model, specifically including Chinese question answering, English question answering, language comprehension, common sense questioning, logical reasoning, mathematical problem-solving, and coding ability. The results of the evaluations are as follows:
58
+
59
+ | Capability Dimension | Dataset | | XVERSE-13B-2 | XVERSE-13B | Baichuan2-13B | Llama1-13B | Llama2-13B |
60
+ | :--------------------: | :------------------------: | :----: | :----------: | :--------: | :-----------: | :--------: | :--------: |
61
+ | Chinese QA | C-Eval | 5-shot | 63.5 | 54.7 | 58.1 | 28.8 | 35.6 |
62
+ | | CMMLU | 5-shot | 66.2 | 59.1 | 62.0 | 31.5 | 38.4 |
63
+ | | Gaokao-Bench<sup>1</sup> | 5-shot | 67.5 | 53.9 | 54.3 | 26.4 | 35.4 |
64
+ | English QA | MMLU | 5-shot | 61.2 | 55.1 | 59.2 | 46.9 | 54.8 |
65
+ | | GAOKAO-English<sup>1</sup> | 5-shot | 73.7 | 66.5 | 67.7 | 38.1 | 60.6 |
66
+ | Chinese & English QA | AGIEval<sup>1</sup> | 5-shot | 54.5 | 41.4 | 48.2 | 27.3 | 33.4 |
67
+ | Language Understanding | RACE-M | 0-shot | 84.6 | 74.2 | 68.9 | 61.6 | 63.0 |
68
+ | Common Sense QA | CommonSenseQA | 7-shot | 74.0 | 69.5 | 65.6 | 62.0 | 67.3 |
69
+ | Reasoning | PIQA | 0-shot | 80.8 | 79.0 | 78.5 | 80.1 | 80.5 |
70
+ | Math | GSM8K | 4-shot | 54.9 | 18.4 | 52.7 | 17.8 | 28.7 |
71
+ | Coding | HumanEval | 0-shot | 39.6 | 15.9 | 17.1 | 15.8 | 18.3 |
72
 
73
  > <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
74
+
75
+ For all the comparison models mentioned above, we prioritize the disclosure of their officially published results. In the absence of official data, we refer to the reported outcomes from [OpenCompass Leaderboard](https://opencompass.org.cn/leaderboard-llm). Results not covered by the aforementioned sources are derived from our own evaluation pipline.
76
+ For MMLU, we adopt the [evaluation tools](https://github.com/hendrycks/test) provided by the authors, C-Eval, AGIEval, GAOKAO-Bench, GAOKAO-English are the same as MMLU. For the remaining evaluation datasets, the [OpenCompass](https://github.com/open-compass/OpenCompass/) is employed for evaluation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
 
78
  ### Loading with Transformers
79
 
config.json CHANGED
@@ -14,6 +14,7 @@
14
  "initializer_range": 0.02,
15
  "intermediate_size": 13824,
16
  "max_position_embeddings": 8192,
 
17
  "model_type": "xverse",
18
  "num_attention_heads": 40,
19
  "num_hidden_layers": 40,
@@ -22,6 +23,5 @@
22
  "torch_dtype": "bfloat16",
23
  "transformers_version": "4.28.1",
24
  "use_cache": true,
25
- "vocab_size": 100278
26
  }
27
-
 
14
  "initializer_range": 0.02,
15
  "intermediate_size": 13824,
16
  "max_position_embeddings": 8192,
17
+ "max_tokenizer_truncation": 6144,
18
  "model_type": "xverse",
19
  "num_attention_heads": 40,
20
  "num_hidden_layers": 40,
 
23
  "torch_dtype": "bfloat16",
24
  "transformers_version": "4.28.1",
25
  "use_cache": true,
26
+ "vocab_size": 100534
27
  }
 
configuration_xverse.py CHANGED
@@ -91,6 +91,7 @@ class XverseConfig(PretrainedConfig):
91
  num_attention_heads=40,
92
  hidden_act="silu",
93
  max_position_embeddings=8192,
 
94
  initializer_range=0.02,
95
  rms_norm_eps=1e-6,
96
  use_cache=True,
@@ -111,6 +112,7 @@ class XverseConfig(PretrainedConfig):
111
  self.initializer_range = initializer_range
112
  self.rms_norm_eps = rms_norm_eps
113
  self.use_cache = use_cache
 
114
 
115
  super().__init__(
116
  pad_token_id=pad_token_id,
 
91
  num_attention_heads=40,
92
  hidden_act="silu",
93
  max_position_embeddings=8192,
94
+ max_tokenizer_truncation=8192,
95
  initializer_range=0.02,
96
  rms_norm_eps=1e-6,
97
  use_cache=True,
 
112
  self.initializer_range = initializer_range
113
  self.rms_norm_eps = rms_norm_eps
114
  self.use_cache = use_cache
115
+ self.max_tokenizer_truncation = max_tokenizer_truncation
116
 
117
  super().__init__(
118
  pad_token_id=pad_token_id,
modeling_xverse.py CHANGED
@@ -611,8 +611,6 @@ class XverseModel(XversePreTrainedModel):
611
 
612
 
613
  class XverseForCausalLM(XversePreTrainedModel):
614
- _tied_weights_keys = ["lm_head.weight"]
615
-
616
  def __init__(self, config):
617
  super().__init__(config)
618
  self.model = XverseModel(config)
@@ -732,15 +730,22 @@ class XverseForCausalLM(XversePreTrainedModel):
732
  max_new_tokens = max_new_tokens or self.generation_config.max_new_tokens
733
  max_input_tokens = self.config.max_position_embeddings - max_new_tokens
734
  max_input_tokens = max(self.config.max_position_embeddings // 2, max_input_tokens)
 
735
 
736
  total_input, round_input = [], []
737
- user_prompt, assist_prompt = "Human: ", "Assistant: "
 
 
 
 
738
  for i, message in enumerate(messages[::-1]):
739
- if message['role'] == 'user':
740
- user_content = f"{user_prompt}{message['content']}\n\n"
 
 
741
  if i == 0:
742
- user_content += assist_prompt
743
- content_tokens = tokenizer.encode(user_content, return_token_type_ids=False)
744
  round_input = content_tokens + round_input
745
 
746
  if i != 0:
@@ -754,12 +759,20 @@ class XverseForCausalLM(XversePreTrainedModel):
754
  break
755
  round_input = []
756
  elif message['role'] == 'assistant':
757
- assist_content = f"{assist_prompt}{message['content']}"
758
- content_tokens = tokenizer.encode(assist_content, return_token_type_ids=False)
759
  round_input = content_tokens + [self.generation_config.eos_token_id] + round_input
 
 
 
 
 
 
 
 
 
760
  else:
761
  raise ValueError(f"message role not supported yet: {message['role']}")
762
- total_input = total_input[-max_input_tokens:] # truncate left
763
  total_input = torch.LongTensor([total_input]).to(self.device)
764
  return total_input
765
 
@@ -779,7 +792,7 @@ class XverseForCausalLM(XversePreTrainedModel):
779
  thread = Thread(target=self.generate, kwargs=generation_kwargs)
780
  thread.start()
781
  for next_text in streamer:
782
- yield next_text.rstrip(tokenizer.eos_token)
783
 
784
  return stream_generator()
785
  else:
@@ -822,9 +835,7 @@ class XverseForCausalLM(XversePreTrainedModel):
822
  def _reorder_cache(past_key_values, beam_idx):
823
  reordered_past = ()
824
  for layer_past in past_key_values:
825
- reordered_past += (
826
- tuple(past_state.index_select(0, beam_idx.to(past_state.device)) for past_state in layer_past),
827
- )
828
  return reordered_past
829
 
830
  def quantize(self, bit_length: int):
 
611
 
612
 
613
  class XverseForCausalLM(XversePreTrainedModel):
 
 
614
  def __init__(self, config):
615
  super().__init__(config)
616
  self.model = XverseModel(config)
 
730
  max_new_tokens = max_new_tokens or self.generation_config.max_new_tokens
731
  max_input_tokens = self.config.max_position_embeddings - max_new_tokens
732
  max_input_tokens = max(self.config.max_position_embeddings // 2, max_input_tokens)
733
+ max_input_tokens = min(self.config.max_tokenizer_truncation, max_input_tokens)
734
 
735
  total_input, round_input = [], []
736
+ user_prompt_tokens = tokenizer.encode("Human: ", return_token_type_ids=False)
737
+ exec_prompt_tokens = tokenizer.encode("Exec: ", return_token_type_ids=False)
738
+ assist_prompt_tokens = tokenizer.encode("Assistant: ", return_token_type_ids=False)
739
+ assist_prompt_len = len(assist_prompt_tokens)
740
+
741
  for i, message in enumerate(messages[::-1]):
742
+ if message['role'] == 'user' or message['role'] == 'exec':
743
+ user_content = f"{message['content']}\n\n"
744
+ content_tokens = user_prompt_tokens + tokenizer.encode(user_content, return_token_type_ids=False) if message['role'] == 'user' else \
745
+ exec_prompt_tokens + tokenizer.encode(user_content, return_token_type_ids=False)
746
  if i == 0:
747
+ content_tokens = content_tokens[:max_input_tokens-assist_prompt_len]
748
+ content_tokens += assist_prompt_tokens
749
  round_input = content_tokens + round_input
750
 
751
  if i != 0:
 
759
  break
760
  round_input = []
761
  elif message['role'] == 'assistant':
762
+ assist_content = f"{message['content']}"
763
+ content_tokens = assist_prompt_tokens + tokenizer.encode(assist_content, return_token_type_ids=False)
764
  round_input = content_tokens + [self.generation_config.eos_token_id] + round_input
765
+ elif message['role'] == 'system':
766
+ assert i == len(messages) - 1
767
+ user_content = f"{message['content']}\n"
768
+ content_tokens = tokenizer.encode(user_content, return_token_type_ids=False)
769
+ round_input = user_prompt_tokens + content_tokens + round_input
770
+ if len(total_input) + len(round_input) > max_input_tokens:
771
+ break
772
+ else:
773
+ total_input = round_input + total_input
774
  else:
775
  raise ValueError(f"message role not supported yet: {message['role']}")
 
776
  total_input = torch.LongTensor([total_input]).to(self.device)
777
  return total_input
778
 
 
792
  thread = Thread(target=self.generate, kwargs=generation_kwargs)
793
  thread.start()
794
  for next_text in streamer:
795
+ yield next_text.replace(tokenizer.eos_token, "")
796
 
797
  return stream_generator()
798
  else:
 
835
  def _reorder_cache(past_key_values, beam_idx):
836
  reordered_past = ()
837
  for layer_past in past_key_values:
838
+ reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)
 
 
839
  return reordered_past
840
 
841
  def quantize(self, bit_length: int):
pytorch_model-00001-of-00015.bin → pytorch_model-00001-of-00010.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e9eefa73431588ba182a9fa06fba459bfbbbc538ed15ea6aaa67ca525b1da446
3
- size 1871016318
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ac6f98cae6a0b3768822474284d619beda358b68304a8bde5f1e493a694ef4e
3
+ size 2508131049
pytorch_model-00002-of-00015.bin → pytorch_model-00002-of-00010.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:88d32bfaf3d7b5368e7c00b6479ea1ef798aca901c3602ddc31e2bfdd2418c9b
3
- size 1903234214
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7715c66734a8871bbd764528ac509caa22b9c7a44b3e2b50ceb5bde1b237f6d5
3
+ size 3172057468
pytorch_model-00003-of-00015.bin → pytorch_model-00003-of-00010.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a943f6aa46a856797a804a15c5cb19035395d41bf239874f244192f38afaba72
3
- size 1903234214
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33b1866910aeadb014e0462c828e5d03d8a52044a3405253ab2f786c8c17279e
3
+ size 3172057468
pytorch_model-00004-of-00015.bin → pytorch_model-00004-of-00010.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b8926fe65e93fbfd9befa4528ca51f52723d781c571948ad64284465a0bd42a0
3
- size 1903234214
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1ad2384bf041f4b3eb0dfa9cd2fd36ca9dea9761504c3b945cfb8302c7449a9
3
+ size 3172057532
pytorch_model-00005-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ebb74708adecad7f7ddf3a5d7ab327a4fcb61c8f0dfb6d66e31e82475a914af7
3
+ size 3172057532
pytorch_model-00005-of-00015.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:4bbd7ff1159cb112d72a0bd0256783478d4cbccb684c87ec26992b1dfa952996
3
- size 1903234214
 
 
 
 
pytorch_model-00006-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2df8350a00f3c5e7e1cf65ea7731c69343df05d5e52205b3284bb1dc43d0edfb
3
+ size 3172057532
pytorch_model-00006-of-00015.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d018135a9b98f873303c8fb6fd1e110fcab63aa63617a03bb17c62e535d5bfa4
3
- size 1903234214
 
 
 
 
pytorch_model-00007-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74cb38f652c76808a77e695afa4157924b1d0ce21db6bc18d1faa6fd7a842aff
3
+ size 3172057532
pytorch_model-00007-of-00015.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1c843829a5757d6fdc506471bae3516cf18aa512ba3ffd7ade30c353606b9427
3
- size 1903234214
 
 
 
 
pytorch_model-00008-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e75e08a58078d4aa1302feaf449241b4c298344b9451b0bcbcb460677e0a7718
3
+ size 3172057532
pytorch_model-00008-of-00015.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:e774213e567df646459b9db90ad70bab09f39a9eca6a894487e7c4aebbaf32a1
3
- size 1903234214
 
 
 
 
pytorch_model-00014-of-00015.bin → pytorch_model-00009-of-00010.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d5b9bffbd923ca0be3badb0bb35e630465a95b21526b1a9ae2cccbd3fc03cb2c
3
  size 1693507250
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa3de49840e4b259ff458a38e5d5a1720a6d1daca77b071a3774600effc16ca2
3
  size 1693507250
pytorch_model-00009-of-00015.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:8e2ee37b9e7e30f6bd914c219550a26eb2e364ff55a1089de1114242f4dc742f
3
- size 1903234214
 
 
 
 
pytorch_model-00010-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17d1b0fa1afc4439ac6d633fe77a4eebd93e0a23f2433e7c39058b9e5ea31a7b
3
+ size 1029571307
pytorch_model-00010-of-00015.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:8b519216f13ae9d3db29e3bf3baa61051038ab686fc5bcc5a056d54215dce66c
3
- size 1903234214
 
 
 
 
pytorch_model-00011-of-00015.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:3f9af8f0d46f93b2dd86f40ec115ddcedf42dff498d8481c96bcfc47d655b7f7
3
- size 1903234214
 
 
 
 
pytorch_model-00012-of-00015.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:dc8045ea179095d6961c8bb579b00cb58a7c65eb17fca343397f55acb30d8fe4
3
- size 1903234214
 
 
 
 
pytorch_model-00013-of-00015.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:92a9f68f76d1b0542c441154a5f05049cf5ca3aaa82166e101a7a4018b44ea37
3
- size 1903234214
 
 
 
 
pytorch_model-00015-of-00015.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:c395ce9729e0cc6a3824c1645838ebc96a4f232ac848e69ecf092fdfcf4380bc
3
- size 1026867947
 
 
 
 
pytorch_model.bin.index.json CHANGED
@@ -1,410 +1,410 @@
1
  {
2
  "metadata": {
3
- "total_size": 27430067200
4
  },
5
  "weight_map": {
6
- "lm_head.weight": "pytorch_model-00015-of-00015.bin",
7
- "model.embed_tokens.weight": "pytorch_model-00001-of-00015.bin",
8
- "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
9
- "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00015.bin",
10
- "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00015.bin",
11
- "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00015.bin",
12
- "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
13
- "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
14
- "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
15
- "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
16
- "model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00015.bin",
17
- "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
18
- "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00015.bin",
19
- "model.layers.1.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
20
- "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
21
- "model.layers.1.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
22
- "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00015.bin",
23
- "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00015.bin",
24
- "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00015.bin",
25
- "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00015.bin",
26
- "model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00015.bin",
27
- "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00015.bin",
28
- "model.layers.10.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
29
- "model.layers.10.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
30
- "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
31
- "model.layers.10.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
32
- "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
33
- "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
34
- "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
35
- "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
36
- "model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00015.bin",
37
- "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
38
- "model.layers.11.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
39
- "model.layers.11.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
40
- "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
41
- "model.layers.11.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
42
- "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
43
- "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
44
- "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
45
- "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
46
- "model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00015.bin",
47
- "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
48
- "model.layers.12.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
49
- "model.layers.12.mlp.down_proj.weight": "pytorch_model-00005-of-00015.bin",
50
- "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00005-of-00015.bin",
51
- "model.layers.12.mlp.up_proj.weight": "pytorch_model-00005-of-00015.bin",
52
- "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
53
- "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
54
- "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
55
- "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
56
- "model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00015.bin",
57
- "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
58
- "model.layers.13.input_layernorm.weight": "pytorch_model-00005-of-00015.bin",
59
- "model.layers.13.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
60
- "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
61
- "model.layers.13.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
62
- "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00005-of-00015.bin",
63
- "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00005-of-00015.bin",
64
- "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00005-of-00015.bin",
65
- "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00005-of-00015.bin",
66
- "model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00015.bin",
67
- "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00005-of-00015.bin",
68
- "model.layers.14.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
69
- "model.layers.14.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
70
- "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
71
- "model.layers.14.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
72
- "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
73
- "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
74
- "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
75
- "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
76
- "model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00015.bin",
77
- "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
78
- "model.layers.15.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
79
- "model.layers.15.mlp.down_proj.weight": "pytorch_model-00006-of-00015.bin",
80
- "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00006-of-00015.bin",
81
- "model.layers.15.mlp.up_proj.weight": "pytorch_model-00006-of-00015.bin",
82
- "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
83
- "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
84
- "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
85
- "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
86
- "model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00015.bin",
87
- "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
88
- "model.layers.16.input_layernorm.weight": "pytorch_model-00006-of-00015.bin",
89
- "model.layers.16.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
90
- "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
91
- "model.layers.16.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
92
- "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00006-of-00015.bin",
93
- "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00006-of-00015.bin",
94
- "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00006-of-00015.bin",
95
- "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00006-of-00015.bin",
96
- "model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00015.bin",
97
- "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00006-of-00015.bin",
98
- "model.layers.17.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
99
- "model.layers.17.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
100
- "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
101
- "model.layers.17.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
102
- "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
103
- "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
104
- "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
105
- "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
106
- "model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00015.bin",
107
- "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
108
- "model.layers.18.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
109
- "model.layers.18.mlp.down_proj.weight": "pytorch_model-00007-of-00015.bin",
110
- "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00007-of-00015.bin",
111
- "model.layers.18.mlp.up_proj.weight": "pytorch_model-00007-of-00015.bin",
112
- "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
113
- "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
114
- "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
115
- "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
116
- "model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00015.bin",
117
- "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
118
- "model.layers.19.input_layernorm.weight": "pytorch_model-00007-of-00015.bin",
119
- "model.layers.19.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
120
- "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
121
- "model.layers.19.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
122
- "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00007-of-00015.bin",
123
- "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00007-of-00015.bin",
124
- "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00007-of-00015.bin",
125
- "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00007-of-00015.bin",
126
- "model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00015.bin",
127
- "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00007-of-00015.bin",
128
- "model.layers.2.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
129
- "model.layers.2.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
130
- "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
131
- "model.layers.2.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
132
- "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
133
- "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
134
- "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
135
- "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
136
- "model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00015.bin",
137
- "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
138
- "model.layers.20.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
139
- "model.layers.20.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
140
- "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
141
- "model.layers.20.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
142
- "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
143
- "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
144
- "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
145
- "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
146
- "model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00015.bin",
147
- "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
148
- "model.layers.21.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
149
- "model.layers.21.mlp.down_proj.weight": "pytorch_model-00008-of-00015.bin",
150
- "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00008-of-00015.bin",
151
- "model.layers.21.mlp.up_proj.weight": "pytorch_model-00008-of-00015.bin",
152
- "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
153
- "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
154
- "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
155
- "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
156
- "model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00015.bin",
157
- "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
158
- "model.layers.22.input_layernorm.weight": "pytorch_model-00008-of-00015.bin",
159
- "model.layers.22.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
160
- "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
161
- "model.layers.22.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
162
- "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00008-of-00015.bin",
163
- "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00008-of-00015.bin",
164
- "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00008-of-00015.bin",
165
- "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00008-of-00015.bin",
166
- "model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00015.bin",
167
- "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00008-of-00015.bin",
168
- "model.layers.23.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
169
- "model.layers.23.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
170
- "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
171
- "model.layers.23.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
172
- "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
173
- "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
174
- "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
175
- "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
176
- "model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00015.bin",
177
- "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
178
- "model.layers.24.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
179
- "model.layers.24.mlp.down_proj.weight": "pytorch_model-00009-of-00015.bin",
180
- "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00009-of-00015.bin",
181
- "model.layers.24.mlp.up_proj.weight": "pytorch_model-00009-of-00015.bin",
182
- "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
183
- "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
184
- "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
185
- "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
186
- "model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00015.bin",
187
- "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
188
- "model.layers.25.input_layernorm.weight": "pytorch_model-00009-of-00015.bin",
189
- "model.layers.25.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
190
- "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
191
- "model.layers.25.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
192
- "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00009-of-00015.bin",
193
- "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00009-of-00015.bin",
194
- "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00009-of-00015.bin",
195
- "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00009-of-00015.bin",
196
- "model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00015.bin",
197
- "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00009-of-00015.bin",
198
- "model.layers.26.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
199
- "model.layers.26.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
200
- "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
201
- "model.layers.26.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
202
- "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
203
- "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
204
- "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
205
- "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
206
- "model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00010-of-00015.bin",
207
- "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
208
- "model.layers.27.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
209
- "model.layers.27.mlp.down_proj.weight": "pytorch_model-00010-of-00015.bin",
210
- "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00010-of-00015.bin",
211
- "model.layers.27.mlp.up_proj.weight": "pytorch_model-00010-of-00015.bin",
212
- "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
213
- "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
214
- "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
215
- "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
216
- "model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00010-of-00015.bin",
217
- "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
218
- "model.layers.28.input_layernorm.weight": "pytorch_model-00010-of-00015.bin",
219
- "model.layers.28.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
220
- "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
221
- "model.layers.28.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
222
- "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00010-of-00015.bin",
223
- "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00010-of-00015.bin",
224
- "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00010-of-00015.bin",
225
- "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00010-of-00015.bin",
226
- "model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00010-of-00015.bin",
227
- "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00010-of-00015.bin",
228
- "model.layers.29.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
229
- "model.layers.29.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
230
- "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
231
- "model.layers.29.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
232
- "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
233
- "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
234
- "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
235
- "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
236
- "model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00011-of-00015.bin",
237
- "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
238
- "model.layers.3.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
239
- "model.layers.3.mlp.down_proj.weight": "pytorch_model-00002-of-00015.bin",
240
- "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00002-of-00015.bin",
241
- "model.layers.3.mlp.up_proj.weight": "pytorch_model-00002-of-00015.bin",
242
- "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
243
- "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
244
- "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
245
- "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
246
- "model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00015.bin",
247
- "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
248
- "model.layers.30.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
249
- "model.layers.30.mlp.down_proj.weight": "pytorch_model-00011-of-00015.bin",
250
- "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00011-of-00015.bin",
251
- "model.layers.30.mlp.up_proj.weight": "pytorch_model-00011-of-00015.bin",
252
- "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
253
- "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
254
- "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
255
- "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
256
- "model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00011-of-00015.bin",
257
- "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
258
- "model.layers.31.input_layernorm.weight": "pytorch_model-00011-of-00015.bin",
259
- "model.layers.31.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
260
- "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
261
- "model.layers.31.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
262
- "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00011-of-00015.bin",
263
- "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00011-of-00015.bin",
264
- "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00011-of-00015.bin",
265
- "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00011-of-00015.bin",
266
- "model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00011-of-00015.bin",
267
- "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00011-of-00015.bin",
268
- "model.layers.32.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
269
- "model.layers.32.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
270
- "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
271
- "model.layers.32.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
272
- "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
273
- "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
274
- "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
275
- "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
276
- "model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00012-of-00015.bin",
277
- "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
278
- "model.layers.33.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
279
- "model.layers.33.mlp.down_proj.weight": "pytorch_model-00012-of-00015.bin",
280
- "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00012-of-00015.bin",
281
- "model.layers.33.mlp.up_proj.weight": "pytorch_model-00012-of-00015.bin",
282
- "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
283
- "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
284
- "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
285
- "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
286
- "model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00012-of-00015.bin",
287
- "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
288
- "model.layers.34.input_layernorm.weight": "pytorch_model-00012-of-00015.bin",
289
- "model.layers.34.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
290
- "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
291
- "model.layers.34.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
292
- "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00012-of-00015.bin",
293
- "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00012-of-00015.bin",
294
- "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00012-of-00015.bin",
295
- "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00012-of-00015.bin",
296
- "model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00012-of-00015.bin",
297
- "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00012-of-00015.bin",
298
- "model.layers.35.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
299
- "model.layers.35.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
300
- "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
301
- "model.layers.35.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
302
- "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
303
- "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
304
- "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
305
- "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
306
- "model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00013-of-00015.bin",
307
- "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
308
- "model.layers.36.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
309
- "model.layers.36.mlp.down_proj.weight": "pytorch_model-00013-of-00015.bin",
310
- "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00013-of-00015.bin",
311
- "model.layers.36.mlp.up_proj.weight": "pytorch_model-00013-of-00015.bin",
312
- "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
313
- "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
314
- "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
315
- "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
316
- "model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00013-of-00015.bin",
317
- "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
318
- "model.layers.37.input_layernorm.weight": "pytorch_model-00013-of-00015.bin",
319
- "model.layers.37.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
320
- "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
321
- "model.layers.37.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
322
- "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00013-of-00015.bin",
323
- "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00013-of-00015.bin",
324
- "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00013-of-00015.bin",
325
- "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00013-of-00015.bin",
326
- "model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00013-of-00015.bin",
327
- "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00013-of-00015.bin",
328
- "model.layers.38.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
329
- "model.layers.38.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
330
- "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
331
- "model.layers.38.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
332
- "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
333
- "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
334
- "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
335
- "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
336
- "model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00014-of-00015.bin",
337
- "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
338
- "model.layers.39.input_layernorm.weight": "pytorch_model-00014-of-00015.bin",
339
- "model.layers.39.mlp.down_proj.weight": "pytorch_model-00014-of-00015.bin",
340
- "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00014-of-00015.bin",
341
- "model.layers.39.mlp.up_proj.weight": "pytorch_model-00014-of-00015.bin",
342
- "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00014-of-00015.bin",
343
- "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00014-of-00015.bin",
344
- "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00014-of-00015.bin",
345
- "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00014-of-00015.bin",
346
- "model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00014-of-00015.bin",
347
- "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00014-of-00015.bin",
348
- "model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00015.bin",
349
- "model.layers.4.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
350
- "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
351
- "model.layers.4.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
352
- "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00015.bin",
353
- "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-00015.bin",
354
- "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00015.bin",
355
- "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00002-of-00015.bin",
356
- "model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00015.bin",
357
- "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00015.bin",
358
- "model.layers.5.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
359
- "model.layers.5.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
360
- "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
361
- "model.layers.5.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
362
- "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
363
- "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
364
- "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
365
- "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
366
- "model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00015.bin",
367
- "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
368
- "model.layers.6.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
369
- "model.layers.6.mlp.down_proj.weight": "pytorch_model-00003-of-00015.bin",
370
- "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00003-of-00015.bin",
371
- "model.layers.6.mlp.up_proj.weight": "pytorch_model-00003-of-00015.bin",
372
- "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
373
- "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
374
- "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
375
- "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
376
- "model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00015.bin",
377
- "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
378
- "model.layers.7.input_layernorm.weight": "pytorch_model-00003-of-00015.bin",
379
- "model.layers.7.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
380
- "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
381
- "model.layers.7.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
382
- "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00003-of-00015.bin",
383
- "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00003-of-00015.bin",
384
- "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00003-of-00015.bin",
385
- "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00003-of-00015.bin",
386
- "model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00015.bin",
387
- "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00003-of-00015.bin",
388
- "model.layers.8.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
389
- "model.layers.8.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
390
- "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
391
- "model.layers.8.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
392
- "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
393
- "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
394
- "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
395
- "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
396
- "model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00015.bin",
397
- "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
398
- "model.layers.9.input_layernorm.weight": "pytorch_model-00004-of-00015.bin",
399
- "model.layers.9.mlp.down_proj.weight": "pytorch_model-00004-of-00015.bin",
400
- "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00004-of-00015.bin",
401
- "model.layers.9.mlp.up_proj.weight": "pytorch_model-00004-of-00015.bin",
402
- "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00004-of-00015.bin",
403
- "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00004-of-00015.bin",
404
- "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00004-of-00015.bin",
405
- "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00004-of-00015.bin",
406
- "model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00015.bin",
407
- "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00004-of-00015.bin",
408
- "model.norm.weight": "pytorch_model-00014-of-00015.bin"
409
  }
410
  }
 
1
  {
2
  "metadata": {
3
+ "total_size": 17578695680
4
  },
5
  "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00010-of-00010.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00010.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00010.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00010.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00010.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
16
+ "model.layers.0.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00010.bin",
17
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
18
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
19
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00010.bin",
20
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00010.bin",
21
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00010.bin",
22
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
23
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
24
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
25
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
26
+ "model.layers.1.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00010.bin",
27
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
28
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
29
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
30
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
31
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
32
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
33
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
34
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
35
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
36
+ "model.layers.10.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
37
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
38
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
39
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
40
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
41
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
42
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
43
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
44
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
45
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
46
+ "model.layers.11.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
47
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
48
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
49
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
50
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
51
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
52
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
53
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
54
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
55
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
56
+ "model.layers.12.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
57
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
58
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
59
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
60
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
61
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
62
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
63
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
64
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
65
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
66
+ "model.layers.13.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
67
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
68
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
69
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
70
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
71
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
72
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
73
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
74
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
75
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
76
+ "model.layers.14.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
77
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
78
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
79
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
80
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
81
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
82
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
83
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
84
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
85
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
86
+ "model.layers.15.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
87
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
88
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
89
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
90
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
91
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
92
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
93
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
94
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
95
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
96
+ "model.layers.16.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
97
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
98
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
99
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
100
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
101
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
102
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
103
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
104
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
105
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
106
+ "model.layers.17.self_attn.rotary_emb.inv_freq": "pytorch_model-00004-of-00010.bin",
107
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
108
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
109
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
110
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
111
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
112
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
113
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
114
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
115
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
116
+ "model.layers.18.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
117
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
118
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
119
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
120
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
121
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
122
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
123
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
124
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
125
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
126
+ "model.layers.19.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
127
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
128
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
129
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
130
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
131
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
132
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
133
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
134
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
135
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
136
+ "model.layers.2.self_attn.rotary_emb.inv_freq": "pytorch_model-00001-of-00010.bin",
137
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
138
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
139
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
140
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
141
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
142
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
143
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
144
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
145
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
146
+ "model.layers.20.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
147
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
148
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
149
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
150
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
151
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
152
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
153
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
154
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
155
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
156
+ "model.layers.21.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
157
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
158
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
159
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
160
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
161
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
162
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
163
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
164
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
165
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
166
+ "model.layers.22.self_attn.rotary_emb.inv_freq": "pytorch_model-00005-of-00010.bin",
167
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
168
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
169
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
170
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
171
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
172
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
173
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
174
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
175
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
176
+ "model.layers.23.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
177
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
178
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
179
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
180
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
181
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
182
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
183
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
184
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
185
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
186
+ "model.layers.24.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
187
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
188
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
189
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
190
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
191
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
192
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
193
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
194
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
195
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
196
+ "model.layers.25.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
197
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
198
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
199
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
200
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
201
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
202
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
203
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
204
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
205
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
206
+ "model.layers.26.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
207
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
208
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
209
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
210
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
211
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
212
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
213
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
214
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
215
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
216
+ "model.layers.27.self_attn.rotary_emb.inv_freq": "pytorch_model-00006-of-00010.bin",
217
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
218
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
219
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
220
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
221
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
222
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
223
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
224
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
225
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
226
+ "model.layers.28.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
227
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
228
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
229
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
230
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
231
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
232
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
233
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
234
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
235
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
236
+ "model.layers.29.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
237
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
238
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
239
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
240
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
241
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
242
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
243
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
244
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
245
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
246
+ "model.layers.3.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
247
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
248
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
249
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
250
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
251
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
252
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
253
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
254
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
255
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
256
+ "model.layers.30.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
257
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
258
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
259
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
260
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
261
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
262
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
263
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
264
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
265
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
266
+ "model.layers.31.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
267
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
268
+ "model.layers.32.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
269
+ "model.layers.32.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
270
+ "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
271
+ "model.layers.32.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
272
+ "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
273
+ "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
274
+ "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
275
+ "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
276
+ "model.layers.32.self_attn.rotary_emb.inv_freq": "pytorch_model-00007-of-00010.bin",
277
+ "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
278
+ "model.layers.33.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
279
+ "model.layers.33.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
280
+ "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
281
+ "model.layers.33.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
282
+ "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
283
+ "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
284
+ "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
285
+ "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
286
+ "model.layers.33.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
287
+ "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
288
+ "model.layers.34.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
289
+ "model.layers.34.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
290
+ "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
291
+ "model.layers.34.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
292
+ "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
293
+ "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
294
+ "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
295
+ "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
296
+ "model.layers.34.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
297
+ "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
298
+ "model.layers.35.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
299
+ "model.layers.35.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
300
+ "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
301
+ "model.layers.35.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
302
+ "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
303
+ "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
304
+ "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
305
+ "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
306
+ "model.layers.35.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
307
+ "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
308
+ "model.layers.36.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
309
+ "model.layers.36.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
310
+ "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
311
+ "model.layers.36.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
312
+ "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
313
+ "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
314
+ "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
315
+ "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
316
+ "model.layers.36.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
317
+ "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
318
+ "model.layers.37.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
319
+ "model.layers.37.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
320
+ "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
321
+ "model.layers.37.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
322
+ "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
323
+ "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
324
+ "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
325
+ "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
326
+ "model.layers.37.self_attn.rotary_emb.inv_freq": "pytorch_model-00008-of-00010.bin",
327
+ "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
328
+ "model.layers.38.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
329
+ "model.layers.38.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
330
+ "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
331
+ "model.layers.38.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
332
+ "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
333
+ "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
334
+ "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
335
+ "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
336
+ "model.layers.38.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00010.bin",
337
+ "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
338
+ "model.layers.39.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
339
+ "model.layers.39.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
340
+ "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
341
+ "model.layers.39.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
342
+ "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
343
+ "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
344
+ "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
345
+ "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
346
+ "model.layers.39.self_attn.rotary_emb.inv_freq": "pytorch_model-00009-of-00010.bin",
347
+ "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
348
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
349
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
350
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
351
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
352
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
353
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
354
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
355
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
356
+ "model.layers.4.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
357
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
358
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
359
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
360
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
361
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
362
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
363
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
364
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
365
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
366
+ "model.layers.5.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
367
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
368
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
369
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
370
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
371
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
372
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
373
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
374
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
375
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
376
+ "model.layers.6.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
377
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
378
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
379
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
380
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
381
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
382
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
383
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
384
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
385
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
386
+ "model.layers.7.self_attn.rotary_emb.inv_freq": "pytorch_model-00002-of-00010.bin",
387
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
388
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
389
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
390
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
391
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
392
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
393
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
394
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
395
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
396
+ "model.layers.8.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
397
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
398
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
399
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
400
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
401
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
402
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
403
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
404
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
405
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
406
+ "model.layers.9.self_attn.rotary_emb.inv_freq": "pytorch_model-00003-of-00010.bin",
407
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
408
+ "model.norm.weight": "pytorch_model-00009-of-00010.bin"
409
  }
410
  }
tokenizer.json CHANGED
@@ -58,14 +58,6 @@
58
  "special": true
59
  }
60
  ],
61
- "normalizer": {
62
- "type": "Sequence",
63
- "normalizers": [
64
- {
65
- "type": "NFKC"
66
- }
67
- ]
68
- },
69
  "pre_tokenizer": {
70
  "type": "Sequence",
71
  "pretokenizers": [
@@ -86,9 +78,17 @@
86
  },
87
  "post_processor": null,
88
  "decoder": {
89
- "type": "Metaspace",
90
- "replacement": "▁",
91
- "add_prefix_space": false
 
 
 
 
 
 
 
 
92
  },
93
  "model": {
94
  "type": "BPE",
@@ -100376,7 +100376,263 @@
100376
  "nj": 100274,
100377
  "iful": 100275,
100378
  "▁solution": 100276,
100379
- "\n": 100277
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100380
  },
100381
  "merges": [
100382
  "▁ t",
@@ -104090,4 +104346,4 @@
104090
  "▁sol ution"
104091
  ]
104092
  }
104093
- }
 
58
  "special": true
59
  }
60
  ],
 
 
 
 
 
 
 
 
61
  "pre_tokenizer": {
62
  "type": "Sequence",
63
  "pretokenizers": [
 
78
  },
79
  "post_processor": null,
80
  "decoder": {
81
+ "type": "Sequence",
82
+ "decoders": [
83
+ {
84
+ "type": "Metaspace",
85
+ "replacement": "▁",
86
+ "add_prefix_space": false
87
+ },
88
+ {
89
+ "type": "ByteFallback"
90
+ }
91
+ ]
92
  },
93
  "model": {
94
  "type": "BPE",
 
100376
  "nj": 100274,
100377
  "iful": 100275,
100378
  "▁solution": 100276,
100379
+ "\n": 100277,
100380
+ "<0x00>": 100278,
100381
+ "<0x01>": 100279,
100382
+ "<0x02>": 100280,
100383
+ "<0x03>": 100281,
100384
+ "<0x04>": 100282,
100385
+ "<0x05>": 100283,
100386
+ "<0x06>": 100284,
100387
+ "<0x07>": 100285,
100388
+ "<0x08>": 100286,
100389
+ "<0x09>": 100287,
100390
+ "<0x0A>": 100288,
100391
+ "<0x0B>": 100289,
100392
+ "<0x0C>": 100290,
100393
+ "<0x0D>": 100291,
100394
+ "<0x0E>": 100292,
100395
+ "<0x0F>": 100293,
100396
+ "<0x10>": 100294,
100397
+ "<0x11>": 100295,
100398
+ "<0x12>": 100296,
100399
+ "<0x13>": 100297,
100400
+ "<0x14>": 100298,
100401
+ "<0x15>": 100299,
100402
+ "<0x16>": 100300,
100403
+ "<0x17>": 100301,
100404
+ "<0x18>": 100302,
100405
+ "<0x19>": 100303,
100406
+ "<0x1A>": 100304,
100407
+ "<0x1B>": 100305,
100408
+ "<0x1C>": 100306,
100409
+ "<0x1D>": 100307,
100410
+ "<0x1E>": 100308,
100411
+ "<0x1F>": 100309,
100412
+ "<0x20>": 100310,
100413
+ "<0x21>": 100311,
100414
+ "<0x22>": 100312,
100415
+ "<0x23>": 100313,
100416
+ "<0x24>": 100314,
100417
+ "<0x25>": 100315,
100418
+ "<0x26>": 100316,
100419
+ "<0x27>": 100317,
100420
+ "<0x28>": 100318,
100421
+ "<0x29>": 100319,
100422
+ "<0x2A>": 100320,
100423
+ "<0x2B>": 100321,
100424
+ "<0x2C>": 100322,
100425
+ "<0x2D>": 100323,
100426
+ "<0x2E>": 100324,
100427
+ "<0x2F>": 100325,
100428
+ "<0x30>": 100326,
100429
+ "<0x31>": 100327,
100430
+ "<0x32>": 100328,
100431
+ "<0x33>": 100329,
100432
+ "<0x34>": 100330,
100433
+ "<0x35>": 100331,
100434
+ "<0x36>": 100332,
100435
+ "<0x37>": 100333,
100436
+ "<0x38>": 100334,
100437
+ "<0x39>": 100335,
100438
+ "<0x3A>": 100336,
100439
+ "<0x3B>": 100337,
100440
+ "<0x3C>": 100338,
100441
+ "<0x3D>": 100339,
100442
+ "<0x3E>": 100340,
100443
+ "<0x3F>": 100341,
100444
+ "<0x40>": 100342,
100445
+ "<0x41>": 100343,
100446
+ "<0x42>": 100344,
100447
+ "<0x43>": 100345,
100448
+ "<0x44>": 100346,
100449
+ "<0x45>": 100347,
100450
+ "<0x46>": 100348,
100451
+ "<0x47>": 100349,
100452
+ "<0x48>": 100350,
100453
+ "<0x49>": 100351,
100454
+ "<0x4A>": 100352,
100455
+ "<0x4B>": 100353,
100456
+ "<0x4C>": 100354,
100457
+ "<0x4D>": 100355,
100458
+ "<0x4E>": 100356,
100459
+ "<0x4F>": 100357,
100460
+ "<0x50>": 100358,
100461
+ "<0x51>": 100359,
100462
+ "<0x52>": 100360,
100463
+ "<0x53>": 100361,
100464
+ "<0x54>": 100362,
100465
+ "<0x55>": 100363,
100466
+ "<0x56>": 100364,
100467
+ "<0x57>": 100365,
100468
+ "<0x58>": 100366,
100469
+ "<0x59>": 100367,
100470
+ "<0x5A>": 100368,
100471
+ "<0x5B>": 100369,
100472
+ "<0x5C>": 100370,
100473
+ "<0x5D>": 100371,
100474
+ "<0x5E>": 100372,
100475
+ "<0x5F>": 100373,
100476
+ "<0x60>": 100374,
100477
+ "<0x61>": 100375,
100478
+ "<0x62>": 100376,
100479
+ "<0x63>": 100377,
100480
+ "<0x64>": 100378,
100481
+ "<0x65>": 100379,
100482
+ "<0x66>": 100380,
100483
+ "<0x67>": 100381,
100484
+ "<0x68>": 100382,
100485
+ "<0x69>": 100383,
100486
+ "<0x6A>": 100384,
100487
+ "<0x6B>": 100385,
100488
+ "<0x6C>": 100386,
100489
+ "<0x6D>": 100387,
100490
+ "<0x6E>": 100388,
100491
+ "<0x6F>": 100389,
100492
+ "<0x70>": 100390,
100493
+ "<0x71>": 100391,
100494
+ "<0x72>": 100392,
100495
+ "<0x73>": 100393,
100496
+ "<0x74>": 100394,
100497
+ "<0x75>": 100395,
100498
+ "<0x76>": 100396,
100499
+ "<0x77>": 100397,
100500
+ "<0x78>": 100398,
100501
+ "<0x79>": 100399,
100502
+ "<0x7A>": 100400,
100503
+ "<0x7B>": 100401,
100504
+ "<0x7C>": 100402,
100505
+ "<0x7D>": 100403,
100506
+ "<0x7E>": 100404,
100507
+ "<0x7F>": 100405,
100508
+ "<0x80>": 100406,
100509
+ "<0x81>": 100407,
100510
+ "<0x82>": 100408,
100511
+ "<0x83>": 100409,
100512
+ "<0x84>": 100410,
100513
+ "<0x85>": 100411,
100514
+ "<0x86>": 100412,
100515
+ "<0x87>": 100413,
100516
+ "<0x88>": 100414,
100517
+ "<0x89>": 100415,
100518
+ "<0x8A>": 100416,
100519
+ "<0x8B>": 100417,
100520
+ "<0x8C>": 100418,
100521
+ "<0x8D>": 100419,
100522
+ "<0x8E>": 100420,
100523
+ "<0x8F>": 100421,
100524
+ "<0x90>": 100422,
100525
+ "<0x91>": 100423,
100526
+ "<0x92>": 100424,
100527
+ "<0x93>": 100425,
100528
+ "<0x94>": 100426,
100529
+ "<0x95>": 100427,
100530
+ "<0x96>": 100428,
100531
+ "<0x97>": 100429,
100532
+ "<0x98>": 100430,
100533
+ "<0x99>": 100431,
100534
+ "<0x9A>": 100432,
100535
+ "<0x9B>": 100433,
100536
+ "<0x9C>": 100434,
100537
+ "<0x9D>": 100435,
100538
+ "<0x9E>": 100436,
100539
+ "<0x9F>": 100437,
100540
+ "<0xA0>": 100438,
100541
+ "<0xA1>": 100439,
100542
+ "<0xA2>": 100440,
100543
+ "<0xA3>": 100441,
100544
+ "<0xA4>": 100442,
100545
+ "<0xA5>": 100443,
100546
+ "<0xA6>": 100444,
100547
+ "<0xA7>": 100445,
100548
+ "<0xA8>": 100446,
100549
+ "<0xA9>": 100447,
100550
+ "<0xAA>": 100448,
100551
+ "<0xAB>": 100449,
100552
+ "<0xAC>": 100450,
100553
+ "<0xAD>": 100451,
100554
+ "<0xAE>": 100452,
100555
+ "<0xAF>": 100453,
100556
+ "<0xB0>": 100454,
100557
+ "<0xB1>": 100455,
100558
+ "<0xB2>": 100456,
100559
+ "<0xB3>": 100457,
100560
+ "<0xB4>": 100458,
100561
+ "<0xB5>": 100459,
100562
+ "<0xB6>": 100460,
100563
+ "<0xB7>": 100461,
100564
+ "<0xB8>": 100462,
100565
+ "<0xB9>": 100463,
100566
+ "<0xBA>": 100464,
100567
+ "<0xBB>": 100465,
100568
+ "<0xBC>": 100466,
100569
+ "<0xBD>": 100467,
100570
+ "<0xBE>": 100468,
100571
+ "<0xBF>": 100469,
100572
+ "<0xC0>": 100470,
100573
+ "<0xC1>": 100471,
100574
+ "<0xC2>": 100472,
100575
+ "<0xC3>": 100473,
100576
+ "<0xC4>": 100474,
100577
+ "<0xC5>": 100475,
100578
+ "<0xC6>": 100476,
100579
+ "<0xC7>": 100477,
100580
+ "<0xC8>": 100478,
100581
+ "<0xC9>": 100479,
100582
+ "<0xCA>": 100480,
100583
+ "<0xCB>": 100481,
100584
+ "<0xCC>": 100482,
100585
+ "<0xCD>": 100483,
100586
+ "<0xCE>": 100484,
100587
+ "<0xCF>": 100485,
100588
+ "<0xD0>": 100486,
100589
+ "<0xD1>": 100487,
100590
+ "<0xD2>": 100488,
100591
+ "<0xD3>": 100489,
100592
+ "<0xD4>": 100490,
100593
+ "<0xD5>": 100491,
100594
+ "<0xD6>": 100492,
100595
+ "<0xD7>": 100493,
100596
+ "<0xD8>": 100494,
100597
+ "<0xD9>": 100495,
100598
+ "<0xDA>": 100496,
100599
+ "<0xDB>": 100497,
100600
+ "<0xDC>": 100498,
100601
+ "<0xDD>": 100499,
100602
+ "<0xDE>": 100500,
100603
+ "<0xDF>": 100501,
100604
+ "<0xE0>": 100502,
100605
+ "<0xE1>": 100503,
100606
+ "<0xE2>": 100504,
100607
+ "<0xE3>": 100505,
100608
+ "<0xE4>": 100506,
100609
+ "<0xE5>": 100507,
100610
+ "<0xE6>": 100508,
100611
+ "<0xE7>": 100509,
100612
+ "<0xE8>": 100510,
100613
+ "<0xE9>": 100511,
100614
+ "<0xEA>": 100512,
100615
+ "<0xEB>": 100513,
100616
+ "<0xEC>": 100514,
100617
+ "<0xED>": 100515,
100618
+ "<0xEE>": 100516,
100619
+ "<0xEF>": 100517,
100620
+ "<0xF0>": 100518,
100621
+ "<0xF1>": 100519,
100622
+ "<0xF2>": 100520,
100623
+ "<0xF3>": 100521,
100624
+ "<0xF4>": 100522,
100625
+ "<0xF5>": 100523,
100626
+ "<0xF6>": 100524,
100627
+ "<0xF7>": 100525,
100628
+ "<0xF8>": 100526,
100629
+ "<0xF9>": 100527,
100630
+ "<0xFA>": 100528,
100631
+ "<0xFB>": 100529,
100632
+ "<0xFC>": 100530,
100633
+ "<0xFD>": 100531,
100634
+ "<0xFE>": 100532,
100635
+ "<0xFF>": 100533
100636
  },
100637
  "merges": [
100638
  "▁ t",
 
104346
  "▁sol ution"
104347
  ]
104348
  }
104349
+ }