sharpenb commited on
Commit
fc69c5e
·
verified ·
1 Parent(s): af9d040

736b09e13b4dfc4d273161eb219da547da1bdc85536039b5bd1c3f460552e0ac

Browse files
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ thumbnail: "https://assets-global.website-files.com/646b351987a8d8ce158d1940/64ec9e96b4334c0e1ac41504_Logo%20with%20white%20text.svg"
3
+ base_model: BAAI/AquilaChat2-7B
4
+ metrics:
5
+ - memory_disk
6
+ - memory_inference
7
+ - inference_latency
8
+ - inference_throughput
9
+ - inference_CO2_emissions
10
+ - inference_energy_consumption
11
+ tags:
12
+ - pruna-ai
13
+ ---
14
+ <!-- header start -->
15
+ <!-- 200823 -->
16
+ <div style="width: auto; margin-left: auto; margin-right: auto">
17
+ <a href="https://www.pruna.ai/" target="_blank" rel="noopener noreferrer">
18
+ <img src="https://i.imgur.com/eDAlcgk.png" alt="PrunaAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
19
+ </a>
20
+ </div>
21
+ <!-- header end -->
22
+
23
+ [![Twitter](https://img.shields.io/twitter/follow/PrunaAI?style=social)](https://twitter.com/PrunaAI)
24
+ [![GitHub](https://img.shields.io/github/followers/PrunaAI?label=Follow%20%40PrunaAI&style=social)](https://github.com/PrunaAI)
25
+ [![LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue)](https://www.linkedin.com/company/93832878/admin/feed/posts/?feedType=following)
26
+ [![Discord](https://img.shields.io/badge/Discord-Join%20Us-blue?style=social&logo=discord)](https://discord.gg/rskEr4BZJx)
27
+
28
+ # Simply make AI models cheaper, smaller, faster, and greener!
29
+
30
+ - Give a thumbs up if you like this model!
31
+ - Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
32
+ - Request access to easily compress your *own* AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
33
+ - Read the documentations to know more [here](https://pruna-ai-pruna.readthedocs-hosted.com/en/latest/)
34
+ - Join Pruna AI community on Discord [here](https://discord.gg/CP4VSgck) to share feedback/suggestions or get help.
35
+
36
+ ## Results
37
+
38
+ ![image info](./plots.png)
39
+
40
+ **Frequently Asked Questions**
41
+ - ***How does the compression work?*** The model is compressed with llm-int8.
42
+ - ***How does the model quality change?*** The quality of the model output might vary compared to the base model.
43
+ - ***How is the model efficiency evaluated?*** These results were obtained on HARDWARE_NAME with configuration described in `model/smash_config.json` and are obtained after a hardware warmup. The smashed model is directly compared to the original base model. Efficiency results may vary in other settings (e.g. other hardware, image size, batch size, ...). We recommend to directly run them in the use-case conditions to know if the smashed model can benefit you.
44
+ - ***What is the model format?*** We use safetensors.
45
+ - ***What calibration data has been used?*** If needed by the compression method, we used WikiText as the calibration data.
46
+ - ***What is the naming convention for Pruna Huggingface models?*** We take the original model name and append "turbo", "tiny", or "green" if the smashed model has a measured inference speed, inference memory, or inference energy consumption which is less than 90% of the original base model.
47
+ - ***How to compress my own models?*** You can request premium access to more compression methods and tech support for your specific use-cases [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
48
+ - ***What are "first" metrics?*** Results mentioning "first" are obtained after the first run of the model. The first run might take more memory or be slower than the subsequent runs due cuda overheads.
49
+ - ***What are "Sync" and "Async" metrics?*** "Sync" metrics are obtained by syncing all GPU processes and stop measurement when all of them are executed. "Async" metrics are obtained without syncing all GPU processes and stop when the model output can be used by the CPU. We provide both metrics since both could be relevant depending on the use-case. We recommend to test the efficiency gains directly in your use-cases.
50
+
51
+ ## Setup
52
+
53
+ You can run the smashed model with these steps:
54
+
55
+ 0. Check requirements from the original repo BAAI/AquilaChat2-7B installed. In particular, check python, cuda, and transformers versions.
56
+ 1. Make sure that you have installed quantization related packages.
57
+ ```bash
58
+ pip install transformers accelerate bitsandbytes>0.37.0
59
+ ```
60
+ 2. Load & run the model.
61
+ ```python
62
+ from transformers import AutoModelForCausalLM, AutoTokenizer
63
+
64
+
65
+ model = AutoModelForCausalLM.from_pretrained("PrunaAI/BAAI-AquilaChat2-7B-bnb-8bit-smashed", trust_remote_code=True, device_map='auto')
66
+ tokenizer = AutoTokenizer.from_pretrained("BAAI/AquilaChat2-7B")
67
+
68
+ input_ids = tokenizer("What is the color of prunes?,", return_tensors='pt').to(model.device)["input_ids"]
69
+
70
+ outputs = model.generate(input_ids, max_new_tokens=216)
71
+ tokenizer.decode(outputs[0])
72
+ ```
73
+
74
+ ## Configurations
75
+
76
+ The configuration info are in `smash_config.json`.
77
+
78
+ ## Credits & License
79
+
80
+ The license of the smashed model follows the license of the original model. Please check the license of the original model BAAI/AquilaChat2-7B before using this model which provided the base model. The license of the `pruna-engine` is [here](https://pypi.org/project/pruna-engine/) on Pypi.
81
+
82
+ ## Want to compress other models?
83
+
84
+ - Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
85
+ - Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
added_tokens.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</s>": 100007,
3
+ "<|LDWANG|>": 100002,
4
+ "<|endofpiece|>": 100001,
5
+ "<|startofpiece|>": 100000,
6
+ "[CLS]": 100006,
7
+ "[MASK]": 100003,
8
+ "[gMASK]": 100004,
9
+ "[sMASK]": 100005
10
+ }
config.json ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/ceph/hdd/staff/charpent/.cache/modelsx0wsbj_qk1ggo4pn",
3
+ "architectures": [
4
+ "AquilaForCausalLM"
5
+ ],
6
+ "auto_map": {
7
+ "AutoConfig": "configuration_aquila.AquilaConfig",
8
+ "AutoModelForCausalLM": "modeling_aquila.AquilaForCausalLM"
9
+ },
10
+ "bos_token_id": 100006,
11
+ "eos_token_id": 100007,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 4096,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 11008,
16
+ "max_position_embeddings": 2048,
17
+ "model_type": "aquila",
18
+ "num_attention_heads": 32,
19
+ "num_hidden_layers": 32,
20
+ "num_key_value_heads": 32,
21
+ "pad_token_id": 0,
22
+ "pretraining_tp": 1,
23
+ "quantization_config": {
24
+ "_load_in_4bit": false,
25
+ "_load_in_8bit": true,
26
+ "bnb_4bit_compute_dtype": "bfloat16",
27
+ "bnb_4bit_quant_storage": "uint8",
28
+ "bnb_4bit_quant_type": "fp4",
29
+ "bnb_4bit_use_double_quant": false,
30
+ "llm_int8_enable_fp32_cpu_offload": false,
31
+ "llm_int8_has_fp16_weight": false,
32
+ "llm_int8_skip_modules": [
33
+ "lm_head"
34
+ ],
35
+ "llm_int8_threshold": 6.0,
36
+ "load_in_4bit": false,
37
+ "load_in_8bit": true,
38
+ "quant_method": "bitsandbytes"
39
+ },
40
+ "rms_norm_eps": 1e-05,
41
+ "rope_scaling": null,
42
+ "rope_theta": 10000.0,
43
+ "tie_word_embeddings": false,
44
+ "torch_dtype": "float16",
45
+ "transformers_version": "4.42.4",
46
+ "use_cache": true,
47
+ "vocab_size": 100008
48
+ }
configuration_aquila.py ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2023 EleutherAI and the HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
5
+ # and OPT implementations in this library. It has been modified from its
6
+ # original forms to accommodate minor architectural differences compared
7
+ # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
8
+ #
9
+ # Licensed under the Apache License, Version 2.0 (the "License");
10
+ # you may not use this file except in compliance with the License.
11
+ # You may obtain a copy of the License at
12
+ #
13
+ # http://www.apache.org/licenses/LICENSE-2.0
14
+ #
15
+ # Unless required by applicable law or agreed to in writing, software
16
+ # distributed under the License is distributed on an "AS IS" BASIS,
17
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18
+ # See the License for the specific language governing permissions and
19
+ # limitations under the License.
20
+ """ Aquila model configuration"""
21
+
22
+ from transformers import PretrainedConfig
23
+
24
+
25
+
26
+ class AquilaConfig(PretrainedConfig):
27
+ r"""
28
+ This is the configuration class to store the configuration of a [`AquilaModel`]. It is used to instantiate an Aquila
29
+ model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
30
+ defaults will yield a similar configuration to that of the Aquila-7B.
31
+
32
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
33
+ documentation from [`PretrainedConfig`] for more information.
34
+
35
+
36
+ Args:
37
+ vocab_size (`int`, *optional*, defaults to 32000):
38
+ Vocabulary size of the Aquila model. Defines the number of different tokens that can be represented by the
39
+ `inputs_ids` passed when calling [`AquilaModel`]
40
+ hidden_size (`int`, *optional*, defaults to 4096):
41
+ Dimension of the hidden representations.
42
+ intermediate_size (`int`, *optional*, defaults to 11008):
43
+ Dimension of the MLP representations.
44
+ num_hidden_layers (`int`, *optional*, defaults to 32):
45
+ Number of hidden layers in the Transformer encoder.
46
+ num_attention_heads (`int`, *optional*, defaults to 32):
47
+ Number of attention heads for each attention layer in the Transformer encoder.
48
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
49
+ The non-linear activation function (function or string) in the decoder.
50
+ max_position_embeddings (`int`, *optional*, defaults to 2048):
51
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
52
+ just in case (e.g., 512 or 1024 or 2048).
53
+ initializer_range (`float`, *optional*, defaults to 0.02):
54
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
55
+ rms_norm_eps (`float`, *optional*, defaults to 1e-12):
56
+ The epsilon used by the rms normalization layers.
57
+ use_cache (`bool`, *optional*, defaults to `True`):
58
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
59
+ relevant if `config.is_decoder=True`.
60
+ tie_word_embeddings(`bool`, *optional*, defaults to `False`):
61
+ Whether to tie weight embeddings
62
+ Example:
63
+
64
+ ```python
65
+ >>> from transformers import AquilaModel, AquilaConfig
66
+
67
+ >>> # Initializing a Aquila aquila-7b style configuration
68
+ >>> configuration = AquilaConfig()
69
+
70
+ >>> # Initializing a model from the aquila-7b style configuration
71
+ >>> model = AquilaModel(configuration)
72
+
73
+ >>> # Accessing the model configuration
74
+ >>> configuration = model.config
75
+ ```"""
76
+ model_type = "aquila"
77
+ keys_to_ignore_at_inference = ["past_key_values"]
78
+
79
+ def __init__(
80
+ self,
81
+ vocab_size=100008,
82
+ hidden_size=4096,
83
+ intermediate_size=11008,
84
+ num_hidden_layers=32,
85
+ num_attention_heads=32,
86
+ num_key_value_heads=None,
87
+ hidden_act="silu",
88
+ max_position_embeddings=2048,
89
+ initializer_range=0.02,
90
+ rms_norm_eps=1e-6,
91
+ use_cache=True,
92
+ pad_token_id=0,
93
+ bos_token_id=1,
94
+ eos_token_id=2,
95
+ pretraining_tp=1,
96
+ tie_word_embeddings=False,
97
+ rope_theta=10000.0,
98
+ rope_scaling=None,
99
+ **kwargs,
100
+ ):
101
+ self.vocab_size = vocab_size
102
+ self.max_position_embeddings = max_position_embeddings
103
+ self.hidden_size = hidden_size
104
+ self.intermediate_size = intermediate_size
105
+ self.num_hidden_layers = num_hidden_layers
106
+
107
+ # for backward compatibility
108
+ if num_key_value_heads is None:
109
+ num_key_value_heads = num_attention_heads
110
+
111
+ self.num_key_value_heads = num_key_value_heads
112
+
113
+ self.num_attention_heads = num_attention_heads
114
+ self.hidden_act = hidden_act
115
+ self.initializer_range = initializer_range
116
+ self.rms_norm_eps = rms_norm_eps
117
+ self.pretraining_tp = pretraining_tp
118
+ self.use_cache = use_cache
119
+ self.rope_theta = rope_theta
120
+ self.rope_scaling = rope_scaling
121
+
122
+ super().__init__(
123
+ pad_token_id=pad_token_id,
124
+ bos_token_id=bos_token_id,
125
+ eos_token_id=eos_token_id,
126
+ tie_word_embeddings=tie_word_embeddings,
127
+ **kwargs,
128
+ )
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 100006,
4
+ "eos_token_id": 100007,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.42.4"
7
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors.index.json ADDED
@@ -0,0 +1,522 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 8120508416
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00002-of-00002.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
17
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
18
+ "model.layers.0.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
19
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.0.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
21
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.0.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
23
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.1.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
26
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
27
+ "model.layers.1.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
28
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
29
+ "model.layers.1.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
30
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
31
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.1.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
33
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.1.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
35
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.1.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
37
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
38
+ "model.layers.1.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
39
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
40
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
41
+ "model.layers.10.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
42
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
43
+ "model.layers.10.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
44
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.10.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
46
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.10.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
49
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
50
+ "model.layers.10.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
51
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.10.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
53
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
54
+ "model.layers.10.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
55
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.11.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
58
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.11.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
60
+ "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.11.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
62
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.11.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
65
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
66
+ "model.layers.11.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
67
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.11.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
69
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.11.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
71
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
72
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
73
+ "model.layers.12.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
74
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
75
+ "model.layers.12.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
76
+ "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
77
+ "model.layers.12.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
78
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
79
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.12.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
81
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
82
+ "model.layers.12.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
83
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
84
+ "model.layers.12.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
85
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
86
+ "model.layers.12.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
87
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
88
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
89
+ "model.layers.13.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
90
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
91
+ "model.layers.13.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
92
+ "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
93
+ "model.layers.13.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
94
+ "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
95
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
96
+ "model.layers.13.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
97
+ "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
98
+ "model.layers.13.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
99
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
100
+ "model.layers.13.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
101
+ "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
102
+ "model.layers.13.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
103
+ "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
104
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.14.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
106
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
107
+ "model.layers.14.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
108
+ "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
109
+ "model.layers.14.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
110
+ "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
111
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
112
+ "model.layers.14.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
113
+ "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
114
+ "model.layers.14.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
115
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
116
+ "model.layers.14.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
117
+ "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
118
+ "model.layers.14.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
119
+ "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
120
+ "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
121
+ "model.layers.15.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
122
+ "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
123
+ "model.layers.15.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
124
+ "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
125
+ "model.layers.15.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
126
+ "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
127
+ "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
128
+ "model.layers.15.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
129
+ "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
130
+ "model.layers.15.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
131
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
132
+ "model.layers.15.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
133
+ "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
134
+ "model.layers.15.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
135
+ "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
136
+ "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
137
+ "model.layers.16.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
138
+ "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
139
+ "model.layers.16.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
140
+ "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
141
+ "model.layers.16.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
142
+ "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
143
+ "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
144
+ "model.layers.16.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
145
+ "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
146
+ "model.layers.16.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
147
+ "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
148
+ "model.layers.16.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
149
+ "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
150
+ "model.layers.16.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
151
+ "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
152
+ "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
153
+ "model.layers.17.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
154
+ "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
155
+ "model.layers.17.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
156
+ "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
157
+ "model.layers.17.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
158
+ "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
159
+ "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
160
+ "model.layers.17.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
161
+ "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
162
+ "model.layers.17.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
163
+ "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
164
+ "model.layers.17.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
165
+ "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
166
+ "model.layers.17.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
167
+ "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
168
+ "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
169
+ "model.layers.18.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
170
+ "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
171
+ "model.layers.18.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
172
+ "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
173
+ "model.layers.18.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
174
+ "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
175
+ "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
176
+ "model.layers.18.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
177
+ "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
178
+ "model.layers.18.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
179
+ "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
180
+ "model.layers.18.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
181
+ "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
182
+ "model.layers.18.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
183
+ "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
184
+ "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
185
+ "model.layers.19.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
186
+ "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
187
+ "model.layers.19.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
188
+ "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
189
+ "model.layers.19.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
190
+ "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
191
+ "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
192
+ "model.layers.19.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
193
+ "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
194
+ "model.layers.19.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
195
+ "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
196
+ "model.layers.19.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
197
+ "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
198
+ "model.layers.19.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
199
+ "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
200
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
201
+ "model.layers.2.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
202
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
203
+ "model.layers.2.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
204
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
205
+ "model.layers.2.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
206
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
207
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
208
+ "model.layers.2.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
209
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
210
+ "model.layers.2.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
211
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
212
+ "model.layers.2.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
213
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
214
+ "model.layers.2.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
215
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
216
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
217
+ "model.layers.20.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
218
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
219
+ "model.layers.20.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
220
+ "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
221
+ "model.layers.20.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
222
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
223
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
224
+ "model.layers.20.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
225
+ "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
226
+ "model.layers.20.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
227
+ "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
228
+ "model.layers.20.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
229
+ "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
230
+ "model.layers.20.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
231
+ "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
232
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
233
+ "model.layers.21.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
234
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
235
+ "model.layers.21.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
236
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
237
+ "model.layers.21.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
238
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
239
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
240
+ "model.layers.21.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
241
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
242
+ "model.layers.21.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
243
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
244
+ "model.layers.21.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
245
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
246
+ "model.layers.21.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
247
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
248
+ "model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
249
+ "model.layers.22.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
250
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
251
+ "model.layers.22.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
252
+ "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
253
+ "model.layers.22.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
254
+ "model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
255
+ "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
256
+ "model.layers.22.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
257
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
258
+ "model.layers.22.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
259
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
260
+ "model.layers.22.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
261
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
262
+ "model.layers.22.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
263
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
264
+ "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
265
+ "model.layers.23.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
266
+ "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
267
+ "model.layers.23.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
268
+ "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
269
+ "model.layers.23.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
270
+ "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
271
+ "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
272
+ "model.layers.23.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
273
+ "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
274
+ "model.layers.23.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
275
+ "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
276
+ "model.layers.23.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
277
+ "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
278
+ "model.layers.23.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
279
+ "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
280
+ "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
281
+ "model.layers.24.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
282
+ "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
283
+ "model.layers.24.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
284
+ "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
285
+ "model.layers.24.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
286
+ "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
287
+ "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
288
+ "model.layers.24.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
289
+ "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
290
+ "model.layers.24.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
291
+ "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
292
+ "model.layers.24.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
293
+ "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
294
+ "model.layers.24.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
295
+ "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
296
+ "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
297
+ "model.layers.25.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
298
+ "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
299
+ "model.layers.25.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
300
+ "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
301
+ "model.layers.25.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
302
+ "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
303
+ "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
304
+ "model.layers.25.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
305
+ "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
306
+ "model.layers.25.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
307
+ "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
308
+ "model.layers.25.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
309
+ "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
310
+ "model.layers.25.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
311
+ "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
312
+ "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
313
+ "model.layers.26.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
314
+ "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
315
+ "model.layers.26.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
316
+ "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
317
+ "model.layers.26.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
318
+ "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
319
+ "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
320
+ "model.layers.26.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
321
+ "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
322
+ "model.layers.26.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
323
+ "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
324
+ "model.layers.26.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
325
+ "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
326
+ "model.layers.26.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
327
+ "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
328
+ "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
329
+ "model.layers.27.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
330
+ "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
331
+ "model.layers.27.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
332
+ "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
333
+ "model.layers.27.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
334
+ "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
335
+ "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
336
+ "model.layers.27.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
337
+ "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
338
+ "model.layers.27.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
339
+ "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
340
+ "model.layers.27.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
341
+ "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
342
+ "model.layers.27.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
343
+ "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
344
+ "model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
345
+ "model.layers.28.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
346
+ "model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
347
+ "model.layers.28.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
348
+ "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
349
+ "model.layers.28.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
350
+ "model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
351
+ "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
352
+ "model.layers.28.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
353
+ "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
354
+ "model.layers.28.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
355
+ "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
356
+ "model.layers.28.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
357
+ "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
358
+ "model.layers.28.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
359
+ "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
360
+ "model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
361
+ "model.layers.29.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
362
+ "model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
363
+ "model.layers.29.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
364
+ "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
365
+ "model.layers.29.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
366
+ "model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
367
+ "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
368
+ "model.layers.29.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
369
+ "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
370
+ "model.layers.29.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
371
+ "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
372
+ "model.layers.29.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
373
+ "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
374
+ "model.layers.29.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
375
+ "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
376
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
377
+ "model.layers.3.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
378
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
379
+ "model.layers.3.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
380
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
381
+ "model.layers.3.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
382
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
383
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
384
+ "model.layers.3.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
385
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
386
+ "model.layers.3.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
387
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
388
+ "model.layers.3.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
389
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
390
+ "model.layers.3.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
391
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
392
+ "model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
393
+ "model.layers.30.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
394
+ "model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
395
+ "model.layers.30.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
396
+ "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
397
+ "model.layers.30.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
398
+ "model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
399
+ "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
400
+ "model.layers.30.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
401
+ "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
402
+ "model.layers.30.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
403
+ "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
404
+ "model.layers.30.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
405
+ "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
406
+ "model.layers.30.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
407
+ "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
408
+ "model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
409
+ "model.layers.31.mlp.down_proj.SCB": "model-00002-of-00002.safetensors",
410
+ "model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
411
+ "model.layers.31.mlp.gate_proj.SCB": "model-00002-of-00002.safetensors",
412
+ "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
413
+ "model.layers.31.mlp.up_proj.SCB": "model-00002-of-00002.safetensors",
414
+ "model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
415
+ "model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
416
+ "model.layers.31.self_attn.k_proj.SCB": "model-00002-of-00002.safetensors",
417
+ "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
418
+ "model.layers.31.self_attn.o_proj.SCB": "model-00002-of-00002.safetensors",
419
+ "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
420
+ "model.layers.31.self_attn.q_proj.SCB": "model-00002-of-00002.safetensors",
421
+ "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
422
+ "model.layers.31.self_attn.v_proj.SCB": "model-00002-of-00002.safetensors",
423
+ "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
424
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
425
+ "model.layers.4.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
426
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
427
+ "model.layers.4.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
428
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
429
+ "model.layers.4.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
430
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
431
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
432
+ "model.layers.4.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
433
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
434
+ "model.layers.4.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
435
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
436
+ "model.layers.4.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
437
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
438
+ "model.layers.4.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
439
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
440
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
441
+ "model.layers.5.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
442
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
443
+ "model.layers.5.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
444
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
445
+ "model.layers.5.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
446
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
447
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
448
+ "model.layers.5.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
449
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
450
+ "model.layers.5.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
451
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
452
+ "model.layers.5.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
453
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
454
+ "model.layers.5.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
455
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
456
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
457
+ "model.layers.6.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
458
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
459
+ "model.layers.6.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
460
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
461
+ "model.layers.6.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
462
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
463
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
464
+ "model.layers.6.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
465
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
466
+ "model.layers.6.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
467
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
468
+ "model.layers.6.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
469
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
470
+ "model.layers.6.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
471
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
472
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
473
+ "model.layers.7.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
474
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
475
+ "model.layers.7.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
476
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
477
+ "model.layers.7.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
478
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
479
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
480
+ "model.layers.7.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
481
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
482
+ "model.layers.7.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
483
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
484
+ "model.layers.7.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
485
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
486
+ "model.layers.7.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
487
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
488
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
489
+ "model.layers.8.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
490
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
491
+ "model.layers.8.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
492
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
493
+ "model.layers.8.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
494
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
495
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
496
+ "model.layers.8.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
497
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
498
+ "model.layers.8.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
499
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
500
+ "model.layers.8.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
501
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
502
+ "model.layers.8.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
503
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
504
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
505
+ "model.layers.9.mlp.down_proj.SCB": "model-00001-of-00002.safetensors",
506
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
507
+ "model.layers.9.mlp.gate_proj.SCB": "model-00001-of-00002.safetensors",
508
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
509
+ "model.layers.9.mlp.up_proj.SCB": "model-00001-of-00002.safetensors",
510
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
511
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
512
+ "model.layers.9.self_attn.k_proj.SCB": "model-00001-of-00002.safetensors",
513
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
514
+ "model.layers.9.self_attn.o_proj.SCB": "model-00001-of-00002.safetensors",
515
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
516
+ "model.layers.9.self_attn.q_proj.SCB": "model-00001-of-00002.safetensors",
517
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
518
+ "model.layers.9.self_attn.v_proj.SCB": "model-00001-of-00002.safetensors",
519
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
520
+ "model.norm.weight": "model-00002-of-00002.safetensors"
521
+ }
522
+ }
modeling_aquila.py ADDED
@@ -0,0 +1,1146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2023 EleutherAI and the HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
5
+ # and OPT implementations in this library. It has been modified from its
6
+ # original forms to accommodate minor architectural differences compared
7
+ # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
8
+ #
9
+ # Licensed under the Apache License, Version 2.0 (the "License");
10
+ # you may not use this file except in compliance with the License.
11
+ # You may obtain a copy of the License at
12
+ #
13
+ # http://www.apache.org/licenses/LICENSE-2.0
14
+ #
15
+ # Unless required by applicable law or agreed to in writing, software
16
+ # distributed under the License is distributed on an "AS IS" BASIS,
17
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18
+ # See the License for the specific language governing permissions and
19
+ # limitations under the License.
20
+ """ PyTorch Aquila model."""
21
+ import math
22
+ from typing import List, Optional, Tuple, Union
23
+
24
+ import torch
25
+ import torch.utils.checkpoint
26
+ from torch import nn
27
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
28
+
29
+ from transformers.activations import ACT2FN
30
+ from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast, SequenceClassifierOutputWithPast
31
+ from transformers.modeling_utils import PreTrainedModel
32
+ from transformers.utils import add_start_docstrings, add_start_docstrings_to_model_forward, logging, replace_return_docstrings
33
+ from .configuration_aquila import AquilaConfig
34
+ from transformers import (
35
+ LogitsProcessorList,
36
+ MinLengthLogitsProcessor,
37
+ TopKLogitsWarper,
38
+ TemperatureLogitsWarper,
39
+ TopPLogitsWarper,
40
+ StoppingCriteriaList,
41
+ MaxLengthCriteria,
42
+ BitsAndBytesConfig,
43
+ )
44
+
45
+ logger = logging.get_logger(__name__)
46
+
47
+ _CONFIG_FOR_DOC = "AquilaConfig"
48
+
49
+
50
+ # Copied from transformers.models.bart.modeling_bart._make_causal_mask
51
+ def _make_causal_mask(
52
+ input_ids_shape: torch.Size, dtype: torch.dtype, device: torch.device, past_key_values_length: int = 0
53
+ ):
54
+ """
55
+ Make causal mask used for bi-directional self-attention.
56
+ """
57
+ bsz, tgt_len = input_ids_shape
58
+ mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device)
59
+ mask_cond = torch.arange(mask.size(-1), device=device)
60
+ mask.masked_fill_(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0)
61
+ mask = mask.to(dtype)
62
+
63
+ if past_key_values_length > 0:
64
+ mask = torch.cat([torch.zeros(tgt_len, past_key_values_length, dtype=dtype, device=device), mask], dim=-1)
65
+ return mask[None, None, :, :].expand(bsz, 1, tgt_len, tgt_len + past_key_values_length)
66
+
67
+
68
+ # Copied from transformers.models.bart.modeling_bart._expand_mask
69
+ def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
70
+ """
71
+ Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
72
+ """
73
+ bsz, src_len = mask.size()
74
+ tgt_len = tgt_len if tgt_len is not None else src_len
75
+
76
+ expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)
77
+
78
+ inverted_mask = 1.0 - expanded_mask
79
+
80
+ return inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(dtype).min)
81
+
82
+
83
+ # Copied from transformers.models.llama.modeling_llama.LlamaRMSNorm with Llama->Aquila
84
+ class AquilaRMSNorm(nn.Module):
85
+ def __init__(self, hidden_size, eps=1e-6):
86
+ """
87
+ AquilaRMSNorm is equivalent to T5LayerNorm
88
+ """
89
+ super().__init__()
90
+ self.weight = nn.Parameter(torch.ones(hidden_size))
91
+ self.variance_epsilon = eps
92
+
93
+ def forward(self, hidden_states):
94
+ input_dtype = hidden_states.dtype
95
+ variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)
96
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
97
+
98
+ return (self.weight * hidden_states).to(input_dtype)
99
+
100
+
101
+ # Copied from transformers.models.llama.modeling_llama.LlamaRotaryEmbedding with Llama->Aquila
102
+ class AquilaRotaryEmbedding(torch.nn.Module):
103
+ def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None):
104
+ super().__init__()
105
+
106
+ self.dim = dim
107
+ self.max_position_embeddings = max_position_embeddings
108
+ self.base = base
109
+ inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
110
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
111
+
112
+ # Build here to make `torch.jit.trace` work.
113
+ self._set_cos_sin_cache(
114
+ seq_len=max_position_embeddings, device=self.inv_freq.device, dtype=torch.get_default_dtype()
115
+ )
116
+
117
+ def _set_cos_sin_cache(self, seq_len, device, dtype):
118
+ self.max_seq_len_cached = seq_len
119
+ t = torch.arange(self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype)
120
+
121
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
122
+ # Different from paper, but it uses a different permutation in order to obtain the same calculation
123
+ emb = torch.cat((freqs, freqs), dim=-1)
124
+ self.register_buffer("cos_cached", emb.cos()[None, None, :, :].to(dtype), persistent=False)
125
+ self.register_buffer("sin_cached", emb.sin()[None, None, :, :].to(dtype), persistent=False)
126
+
127
+ def forward(self, x, seq_len=None):
128
+ # x: [bs, num_attention_heads, seq_len, head_size]
129
+ if seq_len > self.max_seq_len_cached:
130
+ self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)
131
+
132
+ return (
133
+ self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
134
+ self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
135
+ )
136
+
137
+ # Copied from transformers.models.llama.modeling_llama.LlamaLinearScalingRotaryEmbedding with Llama->Aquila
138
+ class AquilaLinearScalingRotaryEmbedding(AquilaRotaryEmbedding):
139
+ """AquilaRotaryEmbedding extended with linear scaling. Credits to the Reddit user /u/kaiokendev"""
140
+
141
+ def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0):
142
+ self.scaling_factor = scaling_factor
143
+ super().__init__(dim, max_position_embeddings, base, device)
144
+
145
+ def _set_cos_sin_cache(self, seq_len, device, dtype):
146
+ self.max_seq_len_cached = seq_len
147
+ t = torch.arange(self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype)
148
+ t = t / self.scaling_factor
149
+
150
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
151
+ # Different from paper, but it uses a different permutation in order to obtain the same calculation
152
+ emb = torch.cat((freqs, freqs), dim=-1)
153
+ self.register_buffer("cos_cached", emb.cos()[None, None, :, :].to(dtype), persistent=False)
154
+ self.register_buffer("sin_cached", emb.sin()[None, None, :, :].to(dtype), persistent=False)
155
+
156
+ # Copied from transformers.models.llama.modeling_llama.LlamaDynamicNTKScalingRotaryEmbedding with Llama->Aquila
157
+ class AquilaDynamicNTKScalingRotaryEmbedding(AquilaRotaryEmbedding):
158
+ """AquilaRotaryEmbedding extended with Dynamic NTK scaling. Credits to the Reddit users /u/bloc97 and /u/emozilla"""
159
+
160
+ def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0):
161
+ self.scaling_factor = scaling_factor
162
+ super().__init__(dim, max_position_embeddings, base, device)
163
+
164
+ def _set_cos_sin_cache(self, seq_len, device, dtype):
165
+ self.max_seq_len_cached = seq_len
166
+
167
+ if seq_len > self.max_position_embeddings:
168
+ base = self.base * (
169
+ (self.scaling_factor * seq_len / self.max_position_embeddings) - (self.scaling_factor - 1)
170
+ ) ** (self.dim / (self.dim - 2))
171
+ inv_freq = 1.0 / (base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
172
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
173
+
174
+ t = torch.arange(self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype)
175
+
176
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
177
+ # Different from paper, but it uses a different permutation in order to obtain the same calculation
178
+ emb = torch.cat((freqs, freqs), dim=-1)
179
+ self.register_buffer("cos_cached", emb.cos()[None, None, :, :].to(dtype), persistent=False)
180
+ self.register_buffer("sin_cached", emb.sin()[None, None, :, :].to(dtype), persistent=False)
181
+
182
+
183
+ def rotate_half(x):
184
+ """Rotates half the hidden dims of the input."""
185
+ x1 = x[..., : x.shape[-1] // 2]
186
+ x2 = x[..., x.shape[-1] // 2 :]
187
+ return torch.cat((-x2, x1), dim=-1)
188
+
189
+
190
+ def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
191
+ # The first two dimensions of cos and sin are always 1, so we can `squeeze` them.
192
+ cos = cos.squeeze(1).squeeze(0) # [seq_len, dim]
193
+ sin = sin.squeeze(1).squeeze(0) # [seq_len, dim]
194
+ cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
195
+ sin = sin[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
196
+ q_embed = (q * cos) + (rotate_half(q) * sin)
197
+ k_embed = (k * cos) + (rotate_half(k) * sin)
198
+ return q_embed, k_embed
199
+
200
+
201
+ # Copied from transformers.models.llama.modeling_llama.LlamaMLP with Llama->Aquila
202
+ class AquilaMLP(nn.Module):
203
+ def __init__(self, config):
204
+ super().__init__()
205
+ self.config = config
206
+ self.hidden_size = config.hidden_size
207
+ self.intermediate_size = config.intermediate_size
208
+ self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
209
+ self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
210
+ self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
211
+ self.act_fn = ACT2FN[config.hidden_act]
212
+
213
+ def forward(self, x):
214
+ if self.config.pretraining_tp > 1:
215
+ slice = self.intermediate_size // self.config.pretraining_tp
216
+ gate_proj_slices = self.gate_proj.weight.split(slice, dim=0)
217
+ up_proj_slices = self.up_proj.weight.split(slice, dim=0)
218
+ down_proj_slices = self.down_proj.weight.split(slice, dim=1)
219
+
220
+ gate_proj = torch.cat(
221
+ [F.linear(x, gate_proj_slices[i]) for i in range(self.config.pretraining_tp)], dim=-1
222
+ )
223
+ up_proj = torch.cat([F.linear(x, up_proj_slices[i]) for i in range(self.config.pretraining_tp)], dim=-1)
224
+
225
+ intermediate_states = (self.act_fn(gate_proj) * up_proj).split(slice, dim=2)
226
+ down_proj = [
227
+ F.linear(intermediate_states[i], down_proj_slices[i]) for i in range(self.config.pretraining_tp)
228
+ ]
229
+ down_proj = sum(down_proj)
230
+ else:
231
+ down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
232
+
233
+ return down_proj
234
+
235
+
236
+ def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
237
+ """
238
+ This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
239
+ num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
240
+ """
241
+ batch, num_key_value_heads, slen, head_dim = hidden_states.shape
242
+ if n_rep == 1:
243
+ return hidden_states
244
+ hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim)
245
+ return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
246
+
247
+
248
+ # Copied from transformers.models.llama.modeling_llama.LlamaAttention with Llama->Aquila
249
+ class AquilaAttention(nn.Module):
250
+ """Multi-headed attention from 'Attention Is All You Need' paper"""
251
+ def __init__(self, config: AquilaConfig):
252
+ super().__init__()
253
+ self.config = config
254
+ self.hidden_size = config.hidden_size
255
+ self.num_heads = config.num_attention_heads
256
+ self.head_dim = self.hidden_size // self.num_heads
257
+ self.num_key_value_heads = config.num_key_value_heads
258
+ self.num_key_value_groups = self.num_heads // self.num_key_value_heads
259
+ self.max_position_embeddings = config.max_position_embeddings
260
+ self.rope_theta = config.rope_theta
261
+
262
+ if (self.head_dim * self.num_heads) != self.hidden_size:
263
+ raise ValueError(
264
+ f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}"
265
+ f" and `num_heads`: {self.num_heads})."
266
+ )
267
+ self.q_proj = nn.Linear(self.hidden_size, self.num_heads * self.head_dim, bias=False)
268
+ self.k_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=False)
269
+ self.v_proj = nn.Linear(self.hidden_size, self.num_key_value_heads * self.head_dim, bias=False)
270
+ self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
271
+ self._init_rope()
272
+
273
+ def _init_rope(self):
274
+ if self.config.rope_scaling is None:
275
+ self.rotary_emb = AquilaRotaryEmbedding(
276
+ self.head_dim,
277
+ max_position_embeddings=self.max_position_embeddings,
278
+ base=self.rope_theta,
279
+ )
280
+ else:
281
+ scaling_type = self.config.rope_scaling["type"]
282
+ scaling_factor = self.config.rope_scaling["factor"]
283
+ if scaling_type == "linear":
284
+ self.rotary_emb = AquilaLinearScalingRotaryEmbedding(
285
+ self.head_dim,
286
+ max_position_embeddings=self.max_position_embeddings,
287
+ scaling_factor=scaling_factor,
288
+ base=self.rope_theta,
289
+ )
290
+ elif scaling_type == "dynamic":
291
+ self.rotary_emb = AquilaDynamicNTKScalingRotaryEmbedding(
292
+ self.head_dim,
293
+ max_position_embeddings=self.max_position_embeddings,
294
+ scaling_factor=scaling_factor,
295
+ base=self.rope_theta,
296
+ )
297
+ else:
298
+ raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
299
+
300
+ def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
301
+ return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous()
302
+
303
+ def forward(
304
+ self,
305
+ hidden_states: torch.Tensor,
306
+ attention_mask: Optional[torch.Tensor] = None,
307
+ position_ids: Optional[torch.LongTensor] = None,
308
+ past_key_value: Optional[Tuple[torch.Tensor]] = None,
309
+ output_attentions: bool = False,
310
+ use_cache: bool = False,
311
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
312
+ bsz, q_len, _ = hidden_states.size()
313
+
314
+ if self.config.pretraining_tp > 1:
315
+ key_value_slicing = (self.num_key_value_heads * self.head_dim) // self.config.pretraining_tp
316
+ query_slices = self.q_proj.weight.split(
317
+ (self.num_heads * self.head_dim) // self.config.pretraining_tp, dim=0
318
+ )
319
+ key_slices = self.k_proj.weight.split(key_value_slicing, dim=0)
320
+ value_slices = self.v_proj.weight.split(key_value_slicing, dim=0)
321
+
322
+ query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.config.pretraining_tp)]
323
+ query_states = torch.cat(query_states, dim=-1)
324
+
325
+ key_states = [F.linear(hidden_states, key_slices[i]) for i in range(self.config.pretraining_tp)]
326
+ key_states = torch.cat(key_states, dim=-1)
327
+
328
+ value_states = [F.linear(hidden_states, value_slices[i]) for i in range(self.config.pretraining_tp)]
329
+ value_states = torch.cat(value_states, dim=-1)
330
+
331
+ else:
332
+ query_states = self.q_proj(hidden_states)
333
+ key_states = self.k_proj(hidden_states)
334
+ value_states = self.v_proj(hidden_states)
335
+
336
+ query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
337
+ key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
338
+ value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
339
+
340
+ kv_seq_len = key_states.shape[-2]
341
+ if past_key_value is not None:
342
+ kv_seq_len += past_key_value[0].shape[-2]
343
+ cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
344
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
345
+
346
+ if past_key_value is not None:
347
+ # reuse k, v, self_attention
348
+ key_states = torch.cat([past_key_value[0], key_states], dim=2)
349
+ value_states = torch.cat([past_key_value[1], value_states], dim=2)
350
+
351
+ past_key_value = (key_states, value_states) if use_cache else None
352
+
353
+ # repeat k/v heads if n_kv_heads < n_heads
354
+ key_states = repeat_kv(key_states, self.num_key_value_groups)
355
+ value_states = repeat_kv(value_states, self.num_key_value_groups)
356
+
357
+ attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
358
+ attn_weights = torch.clamp(attn_weights, min=-1024., max=1024.)
359
+ if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
360
+ raise ValueError(
361
+ f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is"
362
+ f" {attn_weights.size()}"
363
+ )
364
+
365
+ if attention_mask is not None:
366
+ if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
367
+ raise ValueError(
368
+ f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}"
369
+ )
370
+ attn_weights = attn_weights + attention_mask
371
+
372
+ # upcast attention to fp32
373
+ attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
374
+ attn_output = torch.matmul(attn_weights, value_states)
375
+
376
+ if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
377
+ raise ValueError(
378
+ f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
379
+ f" {attn_output.size()}"
380
+ )
381
+
382
+ attn_output = attn_output.transpose(1, 2).contiguous()
383
+ attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
384
+
385
+ if self.config.pretraining_tp > 1:
386
+ attn_output = attn_output.split(self.hidden_size // self.config.pretraining_tp, dim=2)
387
+ o_proj_slices = self.o_proj.weight.split(self.hidden_size // self.config.pretraining_tp, dim=1)
388
+ attn_output = sum([F.linear(attn_output[i], o_proj_slices[i]) for i in range(self.config.pretraining_tp)])
389
+ else:
390
+ attn_output = self.o_proj(attn_output)
391
+
392
+ if not output_attentions:
393
+ attn_weights = None
394
+
395
+ return attn_output, attn_weights, past_key_value
396
+
397
+
398
+ # Copied from transformers.models.llama.modeling_llama.LlamaDecoderLayer with Llama->Aquila
399
+ class AquilaDecoderLayer(nn.Module):
400
+ def __init__(self, config: AquilaConfig):
401
+ super().__init__()
402
+ self.hidden_size = config.hidden_size
403
+ self.self_attn = AquilaAttention(config=config)
404
+ self.mlp = AquilaMLP(config)
405
+ self.input_layernorm = AquilaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
406
+ self.post_attention_layernorm = AquilaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
407
+
408
+ def forward(
409
+ self,
410
+ hidden_states: torch.Tensor,
411
+ attention_mask: Optional[torch.Tensor] = None,
412
+ position_ids: Optional[torch.LongTensor] = None,
413
+ past_key_value: Optional[Tuple[torch.Tensor]] = None,
414
+ output_attentions: Optional[bool] = False,
415
+ use_cache: Optional[bool] = False,
416
+ ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
417
+ """
418
+ Args:
419
+ hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
420
+ attention_mask (`torch.FloatTensor`, *optional*): attention mask of size
421
+ `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
422
+ output_attentions (`bool`, *optional*):
423
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under
424
+ returned tensors for more detail.
425
+ use_cache (`bool`, *optional*):
426
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
427
+ (see `past_key_values`).
428
+ past_key_value (`Tuple(torch.FloatTensor)`, *optional*): cached past key and value projection states
429
+ """
430
+
431
+ residual = hidden_states
432
+
433
+ hidden_states = self.input_layernorm(hidden_states)
434
+
435
+ # Self Attention
436
+ hidden_states, self_attn_weights, present_key_value = self.self_attn(
437
+ hidden_states=hidden_states,
438
+ attention_mask=attention_mask,
439
+ position_ids=position_ids,
440
+ past_key_value=past_key_value,
441
+ output_attentions=output_attentions,
442
+ use_cache=use_cache,
443
+ )
444
+ hidden_states = residual + hidden_states
445
+
446
+ # Fully Connected
447
+ residual = hidden_states
448
+ hidden_states = self.post_attention_layernorm(hidden_states)
449
+ hidden_states = self.mlp(hidden_states)
450
+ hidden_states = residual + hidden_states
451
+
452
+ outputs = (hidden_states,)
453
+
454
+ if output_attentions:
455
+ outputs += (self_attn_weights,)
456
+
457
+ if use_cache:
458
+ outputs += (present_key_value,)
459
+
460
+ return outputs
461
+
462
+ AQUILA_START_DOCSTRING = r"""
463
+ This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
464
+ library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
465
+ etc.)
466
+
467
+ This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
468
+ Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
469
+ and behavior.
470
+
471
+ Parameters:
472
+ config ([`AquilaConfig`]):
473
+ Model configuration class with all the parameters of the model. Initializing with a config file does not
474
+ load the weights associated with the model, only the configuration. Check out the
475
+ [`~PreTrainedModel.from_pretrained`] method to load the model weights.
476
+ """
477
+
478
+
479
+ @add_start_docstrings(
480
+ "The bare Aquila Model outputting raw hidden-states without any specific head on top.",
481
+ AQUILA_START_DOCSTRING,
482
+ )
483
+ # Copied from transformers.models.llama.modeling_llama.LlamaPreTrainedModel with Llama->Aquila
484
+ class AquilaPreTrainedModel(PreTrainedModel):
485
+ config_class = AquilaConfig
486
+ base_model_prefix = "model"
487
+ supports_gradient_checkpointing = True
488
+ _no_split_modules = ["AquilaDecoderLayer"]
489
+ _skip_keys_device_placement = "past_key_values"
490
+
491
+ def _init_weights(self, module):
492
+ std = self.config.initializer_range
493
+ if isinstance(module, nn.Linear):
494
+ module.weight.data.normal_(mean=0.0, std=std)
495
+ if module.bias is not None:
496
+ module.bias.data.zero_()
497
+ elif isinstance(module, nn.Embedding):
498
+ module.weight.data.normal_(mean=0.0, std=std)
499
+ if module.padding_idx is not None:
500
+ module.weight.data[module.padding_idx].zero_()
501
+
502
+ def _set_gradient_checkpointing(self, module, value=False):
503
+ if isinstance(module, AquilaModel):
504
+ module.gradient_checkpointing = value
505
+
506
+
507
+ AQUILA_INPUTS_DOCSTRING = r"""
508
+ Args:
509
+ input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
510
+ Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
511
+ it.
512
+
513
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
514
+ [`PreTrainedTokenizer.__call__`] for details.
515
+
516
+ [What are input IDs?](../glossary#input-ids)
517
+ attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
518
+ Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
519
+
520
+ - 1 for tokens that are **not masked**,
521
+ - 0 for tokens that are **masked**.
522
+
523
+ [What are attention masks?](../glossary#attention-mask)
524
+
525
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
526
+ [`PreTrainedTokenizer.__call__`] for details.
527
+
528
+ If `past_key_values` is used, optionally only the last `decoder_input_ids` have to be input (see
529
+ `past_key_values`).
530
+
531
+ If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
532
+ and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more
533
+ information on the default strategy.
534
+
535
+ - 1 indicates the head is **not masked**,
536
+ - 0 indicates the head is **masked**.
537
+ position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
538
+ Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
539
+ config.n_positions - 1]`.
540
+
541
+ [What are position IDs?](../glossary#position-ids)
542
+ past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
543
+ Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape
544
+ `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of shape
545
+ `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`.
546
+
547
+ Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
548
+ blocks) that can be used (see `past_key_values` input) to speed up sequential decoding.
549
+
550
+ If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
551
+ don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
552
+ `decoder_input_ids` of shape `(batch_size, sequence_length)`.
553
+ inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
554
+ Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
555
+ is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
556
+ model's internal embedding lookup matrix.
557
+ use_cache (`bool`, *optional*):
558
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
559
+ `past_key_values`).
560
+ output_attentions (`bool`, *optional*):
561
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
562
+ tensors for more detail.
563
+ output_hidden_states (`bool`, *optional*):
564
+ Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
565
+ more detail.
566
+ return_dict (`bool`, *optional*):
567
+ Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
568
+ """
569
+
570
+
571
+ @add_start_docstrings(
572
+ "The bare Aquila Model outputting raw hidden-states without any specific head on top.",
573
+ AQUILA_START_DOCSTRING,
574
+ )
575
+ # Copied from transformers.models.llama.modeling_llama.LlamaModel with LLAMA->AQUILA,Llama->Aquila
576
+ class AquilaModel(AquilaPreTrainedModel):
577
+ """
578
+ Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`AquilaDecoderLayer`]
579
+
580
+ Args:
581
+ config: AquilaConfig
582
+ """
583
+
584
+ def __init__(self, config: AquilaConfig):
585
+ super().__init__(config)
586
+ self.padding_idx = config.pad_token_id
587
+ self.vocab_size = config.vocab_size
588
+
589
+ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
590
+ self.layers = nn.ModuleList([AquilaDecoderLayer(config) for _ in range(config.num_hidden_layers)])
591
+ self.norm = AquilaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
592
+
593
+ self.gradient_checkpointing = False
594
+ # Initialize weights and apply final processing
595
+ self.post_init()
596
+
597
+ def get_input_embeddings(self):
598
+ return self.embed_tokens
599
+
600
+ def set_input_embeddings(self, value):
601
+ self.embed_tokens = value
602
+
603
+ def _prepare_decoder_attention_mask(self, attention_mask, input_shape, inputs_embeds, past_key_values_length):
604
+ # create causal mask
605
+ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
606
+ combined_attention_mask = None
607
+ if input_shape[-1] > 1:
608
+ combined_attention_mask = _make_causal_mask(
609
+ input_shape,
610
+ inputs_embeds.dtype,
611
+ device=inputs_embeds.device,
612
+ past_key_values_length=past_key_values_length,
613
+ )
614
+
615
+ if attention_mask is not None:
616
+ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
617
+ expanded_attn_mask = _expand_mask(attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]).to(
618
+ inputs_embeds.device
619
+ )
620
+ combined_attention_mask = (
621
+ expanded_attn_mask if combined_attention_mask is None else expanded_attn_mask + combined_attention_mask
622
+ )
623
+
624
+ return combined_attention_mask
625
+
626
+ @add_start_docstrings_to_model_forward(AQUILA_INPUTS_DOCSTRING)
627
+ def forward(
628
+ self,
629
+ input_ids: torch.LongTensor = None,
630
+ attention_mask: Optional[torch.Tensor] = None,
631
+ position_ids: Optional[torch.LongTensor] = None,
632
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
633
+ inputs_embeds: Optional[torch.FloatTensor] = None,
634
+ use_cache: Optional[bool] = None,
635
+ output_attentions: Optional[bool] = None,
636
+ output_hidden_states: Optional[bool] = None,
637
+ return_dict: Optional[bool] = None,
638
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
639
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
640
+ output_hidden_states = (
641
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
642
+ )
643
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
644
+
645
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
646
+
647
+ # retrieve input_ids and inputs_embeds
648
+ if input_ids is not None and inputs_embeds is not None:
649
+ raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
650
+ elif input_ids is not None:
651
+ batch_size, seq_length = input_ids.shape
652
+ elif inputs_embeds is not None:
653
+ batch_size, seq_length, _ = inputs_embeds.shape
654
+ else:
655
+ raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
656
+
657
+ seq_length_with_past = seq_length
658
+ past_key_values_length = 0
659
+
660
+ if past_key_values is not None:
661
+ past_key_values_length = past_key_values[0][0].shape[2]
662
+ seq_length_with_past = seq_length_with_past + past_key_values_length
663
+
664
+ if position_ids is None:
665
+ device = input_ids.device if input_ids is not None else inputs_embeds.device
666
+ position_ids = torch.arange(
667
+ past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=device
668
+ )
669
+ position_ids = position_ids.unsqueeze(0).view(-1, seq_length)
670
+ else:
671
+ position_ids = position_ids.view(-1, seq_length).long()
672
+
673
+ if inputs_embeds is None:
674
+ inputs_embeds = self.embed_tokens(input_ids)
675
+ # embed positions
676
+ if attention_mask is None:
677
+ attention_mask = torch.ones(
678
+ (batch_size, seq_length_with_past), dtype=torch.bool, device=inputs_embeds.device
679
+ )
680
+ attention_mask = self._prepare_decoder_attention_mask(
681
+ attention_mask, (batch_size, seq_length), inputs_embeds, past_key_values_length
682
+ )
683
+
684
+ hidden_states = inputs_embeds
685
+
686
+ if self.gradient_checkpointing and self.training:
687
+ if use_cache:
688
+ logger.warning_once(
689
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
690
+ )
691
+ use_cache = False
692
+
693
+ # decoder layers
694
+ all_hidden_states = () if output_hidden_states else None
695
+ all_self_attns = () if output_attentions else None
696
+ next_decoder_cache = () if use_cache else None
697
+
698
+ for idx, decoder_layer in enumerate(self.layers):
699
+ if output_hidden_states:
700
+ all_hidden_states += (hidden_states,)
701
+
702
+ past_key_value = past_key_values[idx] if past_key_values is not None else None
703
+
704
+ if self.gradient_checkpointing and self.training:
705
+
706
+ def create_custom_forward(module):
707
+ def custom_forward(*inputs):
708
+ # None for past_key_value
709
+ return module(*inputs, past_key_value, output_attentions)
710
+
711
+ return custom_forward
712
+
713
+ layer_outputs = torch.utils.checkpoint.checkpoint(
714
+ create_custom_forward(decoder_layer),
715
+ hidden_states,
716
+ attention_mask,
717
+ position_ids,
718
+ )
719
+ else:
720
+ layer_outputs = decoder_layer(
721
+ hidden_states,
722
+ attention_mask=attention_mask,
723
+ position_ids=position_ids,
724
+ past_key_value=past_key_value,
725
+ output_attentions=output_attentions,
726
+ use_cache=use_cache,
727
+ )
728
+
729
+ hidden_states = layer_outputs[0]
730
+
731
+ if use_cache:
732
+ next_decoder_cache += (layer_outputs[2 if output_attentions else 1],)
733
+
734
+ if output_attentions:
735
+ all_self_attns += (layer_outputs[1],)
736
+
737
+ hidden_states = self.norm(hidden_states)
738
+
739
+ # add hidden states from the last decoder layer
740
+ if output_hidden_states:
741
+ all_hidden_states += (hidden_states,)
742
+
743
+ next_cache = next_decoder_cache if use_cache else None
744
+ if not return_dict:
745
+ return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
746
+ return BaseModelOutputWithPast(
747
+ last_hidden_state=hidden_states,
748
+ past_key_values=next_cache,
749
+ hidden_states=all_hidden_states,
750
+ attentions=all_self_attns,
751
+ )
752
+
753
+ # Copied from transformers.models.llama.modeling_llama.LlamaForCausalLM with LLAMA->AQUILA,Llama->Aquila
754
+ class AquilaForCausalLM(AquilaPreTrainedModel):
755
+ _tied_weights_keys = ["lm_head.weight"]
756
+
757
+ def __init__(self, config):
758
+ super().__init__(config)
759
+ self.model = AquilaModel(config)
760
+ self.vocab_size = config.vocab_size
761
+ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
762
+
763
+ # Initialize weights and apply final processing
764
+ self.post_init()
765
+
766
+ def get_input_embeddings(self):
767
+ return self.model.embed_tokens
768
+
769
+ def set_input_embeddings(self, value):
770
+ self.model.embed_tokens = value
771
+
772
+ def get_output_embeddings(self):
773
+ return self.lm_head
774
+
775
+ def set_output_embeddings(self, new_embeddings):
776
+ self.lm_head = new_embeddings
777
+
778
+ def set_decoder(self, decoder):
779
+ self.model = decoder
780
+
781
+ def get_decoder(self):
782
+ return self.model
783
+
784
+ @add_start_docstrings_to_model_forward(AQUILA_INPUTS_DOCSTRING)
785
+ @replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC)
786
+ def forward(
787
+ self,
788
+ input_ids: torch.LongTensor = None,
789
+ attention_mask: Optional[torch.Tensor] = None,
790
+ position_ids: Optional[torch.LongTensor] = None,
791
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
792
+ inputs_embeds: Optional[torch.FloatTensor] = None,
793
+ labels: Optional[torch.LongTensor] = None,
794
+ use_cache: Optional[bool] = None,
795
+ output_attentions: Optional[bool] = None,
796
+ output_hidden_states: Optional[bool] = None,
797
+ return_dict: Optional[bool] = None,
798
+ ) -> Union[Tuple, CausalLMOutputWithPast]:
799
+ r"""
800
+ Args:
801
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
802
+ Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
803
+ config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
804
+ (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
805
+
806
+ Returns:
807
+
808
+ Example:
809
+
810
+ ```python
811
+ >>> from transformers import AutoTokenizer, AquilaForCausalLM
812
+
813
+ >>> model = AquilaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
814
+ >>> tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)
815
+
816
+ >>> prompt = "Hey, are you consciours? Can you talk to me?"
817
+ >>> inputs = tokenizer(prompt, return_tensors="pt")
818
+
819
+ >>> # Generate
820
+ >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
821
+ >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
822
+ "Hey, are you consciours? Can you talk to me?\nI'm not consciours, but I can talk to you."
823
+ ```"""
824
+
825
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
826
+ output_hidden_states = (
827
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
828
+ )
829
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
830
+
831
+ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
832
+ outputs = self.model(
833
+ input_ids=input_ids,
834
+ attention_mask=attention_mask,
835
+ position_ids=position_ids,
836
+ past_key_values=past_key_values,
837
+ inputs_embeds=inputs_embeds,
838
+ use_cache=use_cache,
839
+ output_attentions=output_attentions,
840
+ output_hidden_states=output_hidden_states,
841
+ return_dict=return_dict,
842
+ )
843
+
844
+ hidden_states = outputs[0]
845
+ if self.config.pretraining_tp > 1:
846
+ lm_head_slices = self.lm_head.weight.split(self.vocab_size // self.config.pretraining_tp, dim=0)
847
+ logits = [F.linear(hidden_states, lm_head_slices[i]) for i in range(self.config.pretraining_tp)]
848
+ logits = torch.cat(logits, dim=-1)
849
+ else:
850
+ logits = self.lm_head(hidden_states)
851
+ logits = logits.float()
852
+
853
+ loss = None
854
+ if labels is not None:
855
+ # Shift so that tokens < n predict n
856
+ shift_logits = logits[..., :-1, :].contiguous()
857
+ shift_labels = labels[..., 1:].contiguous()
858
+ # Flatten the tokens
859
+ loss_fct = CrossEntropyLoss()
860
+ shift_logits = shift_logits.view(-1, self.config.vocab_size)
861
+ shift_labels = shift_labels.view(-1)
862
+ # Enable model parallelism
863
+ shift_labels = shift_labels.to(shift_logits.device)
864
+ loss = loss_fct(shift_logits, shift_labels)
865
+
866
+ if not return_dict:
867
+ output = (logits,) + outputs[1:]
868
+ return (loss,) + output if loss is not None else output
869
+
870
+ return CausalLMOutputWithPast(
871
+ loss=loss,
872
+ logits=logits,
873
+ past_key_values=outputs.past_key_values,
874
+ hidden_states=outputs.hidden_states,
875
+ attentions=outputs.attentions,
876
+ )
877
+
878
+ def prepare_inputs_for_generation(
879
+ self, input_ids, past_key_values=None, attention_mask=None, inputs_embeds=None, **kwargs
880
+ ):
881
+ if past_key_values:
882
+ input_ids = input_ids[:, -1:]
883
+
884
+ position_ids = kwargs.get("position_ids", None)
885
+ if attention_mask is not None and position_ids is None:
886
+ # create position_ids on the fly for batch generation
887
+ position_ids = attention_mask.long().cumsum(-1) - 1
888
+ position_ids.masked_fill_(attention_mask == 0, 1)
889
+ if past_key_values:
890
+ position_ids = position_ids[:, -1].unsqueeze(-1)
891
+
892
+ # if `inputs_embeds` are passed, we only want to use them in the 1st generation step
893
+ if inputs_embeds is not None and past_key_values is None:
894
+ model_inputs = {"inputs_embeds": inputs_embeds}
895
+ else:
896
+ model_inputs = {"input_ids": input_ids}
897
+
898
+ model_inputs.update(
899
+ {
900
+ "position_ids": position_ids,
901
+ "past_key_values": past_key_values,
902
+ "use_cache": kwargs.get("use_cache"),
903
+ "attention_mask": attention_mask,
904
+ }
905
+ )
906
+ return model_inputs
907
+
908
+ @staticmethod
909
+ def _reorder_cache(past_key_values, beam_idx):
910
+ reordered_past = ()
911
+ for layer_past in past_key_values:
912
+ reordered_past += (
913
+ tuple(past_state.index_select(0, beam_idx.to(past_state.device)) for past_state in layer_past),
914
+ )
915
+ return reordered_past
916
+
917
+ def predict(self, text, tokenizer=None,
918
+ max_gen_len=200, top_p=0.95,
919
+ seed=1234, topk=100,
920
+ temperature=0.9,
921
+ sft=True, convo_template = "aquila-chat",
922
+ device = "cuda"):
923
+
924
+ vocab = tokenizer.get_vocab()
925
+ #device = device
926
+ id2word = {v:k for k, v in vocab.items()}
927
+
928
+
929
+ set_random_seed(seed)
930
+ if temperature == 0:
931
+ topk = 1
932
+ temperature = 1.0
933
+ if sft:
934
+ tokens = covert_prompt_to_input_ids_with_history(text, history=[], tokenizer=tokenizer, max_token=2048, convo_template=convo_template)
935
+ tokens = torch.tensor(tokens)[None,].to(device)
936
+ else :
937
+ tokens = tokenizer.encode_plus(text)["input_ids"]
938
+ print(tokenizer.decode(tokens))
939
+ tokens = torch.tensor(tokens)[None,].to(device)
940
+ input_length = len(tokens[0])
941
+ with torch.no_grad():
942
+
943
+ # instantiate logits processors
944
+ logits_processor = LogitsProcessorList(
945
+ [
946
+ MinLengthLogitsProcessor(1, eos_token_id=100007),
947
+ ]
948
+ )
949
+ # instantiate logits processors
950
+ logits_warper = LogitsProcessorList(
951
+ [
952
+ TopPLogitsWarper(top_p),
953
+ TopKLogitsWarper(topk),
954
+ TemperatureLogitsWarper(temperature),
955
+
956
+ ]
957
+ )
958
+
959
+ stopping_criteria = StoppingCriteriaList([MaxLengthCriteria(max_length=input_length + max_gen_len)])
960
+ out = self.sample(
961
+ tokens,
962
+ logits_processor=logits_processor,
963
+ logits_warper=logits_warper,
964
+ stopping_criteria=stopping_criteria,
965
+ return_dict_in_generate=True,
966
+ output_scores=True,
967
+ )
968
+
969
+
970
+ # print(out)
971
+ out_ids = out["sequences"][0][input_length:].cpu().numpy()
972
+
973
+ out_scores = out["scores"]
974
+
975
+ out_scores = torch.cat(out_scores, dim=0)
976
+ out_scores = torch.nn.functional.softmax(out_scores, dim=-1).cpu().numpy()
977
+
978
+ probs = []
979
+ for i in range(len(out_ids)):
980
+ probs.append(float(out_scores[i][out_ids[i]]))
981
+
982
+ # print(f"probs is {probs}")
983
+
984
+ convert_tokens = []
985
+ for t in out_ids:
986
+ if t == 100006:
987
+ convert_tokens.append("[CLS]")
988
+ else :
989
+ convert_tokens.append(id2word.get(t, "[unkonwn_token]"))
990
+
991
+ out_text = tokenizer.decode(out_ids.tolist())
992
+
993
+
994
+ out = out_text
995
+
996
+ if "###" in out:
997
+ special_index = out.index("###")
998
+ out = out[: special_index]
999
+ token_length = len(tokenizer.encode_plus(out)["input_ids"])
1000
+ convert_tokens = convert_tokens[:token_length]
1001
+ probs = probs[:token_length]
1002
+
1003
+ if "[UNK]" in out:
1004
+ special_index = out.index("[UNK]")
1005
+ out = out[:special_index]
1006
+ token_length = len(tokenizer.encode_plus(out)["input_ids"])
1007
+ convert_tokens = convert_tokens[:token_length]
1008
+ probs = probs[:token_length]
1009
+
1010
+ if "</s>" in out:
1011
+ special_index = out.index("</s>")
1012
+ out = out[: special_index]
1013
+ token_length = len(tokenizer.encode_plus(out)["input_ids"])
1014
+ convert_tokens = convert_tokens[:token_length]
1015
+ probs = probs[:token_length]
1016
+
1017
+ if len(out) > 0 and out[0] == " ":
1018
+ out = out[1:]
1019
+
1020
+ convert_tokens = convert_tokens[1:]
1021
+ probs = probs[1:]
1022
+ return out
1023
+
1024
+ @add_start_docstrings(
1025
+ """
1026
+ The LLaMa Model transformer with a sequence classification head on top (linear layer).
1027
+
1028
+ [`AquilaForSequenceClassification`] uses the last token in order to do the classification, as other causal models
1029
+ (e.g. GPT-2) do.
1030
+
1031
+ Since it does classification on the last token, it requires to know the position of the last token. If a
1032
+ `pad_token_id` is defined in the configuration, it finds the last token that is not a padding token in each row. If
1033
+ no `pad_token_id` is defined, it simply takes the last value in each row of the batch. Since it cannot guess the
1034
+ padding tokens when `inputs_embeds` are passed instead of `input_ids`, it does the same (take the last value in
1035
+ each row of the batch).
1036
+ """,
1037
+ AQUILA_START_DOCSTRING,
1038
+ )
1039
+ # Copied from transformers.models.llama.modeling_llama.LlamaForSequenceClassification with LLAMA->AQUILA,Llama->Aquila
1040
+ class AquilaForSequenceClassification(AquilaPreTrainedModel):
1041
+ _keys_to_ignore_on_load_missing = [r"lm_head.weight"]
1042
+
1043
+ def __init__(self, config):
1044
+ super().__init__(config)
1045
+ self.num_labels = config.num_labels
1046
+ self.model = AquilaModel(config)
1047
+ self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)
1048
+
1049
+ # Initialize weights and apply final processing
1050
+ self.post_init()
1051
+
1052
+ def get_input_embeddings(self):
1053
+ return self.model.embed_tokens
1054
+
1055
+ def set_input_embeddings(self, value):
1056
+ self.model.embed_tokens = value
1057
+
1058
+ @add_start_docstrings_to_model_forward(AQUILA_INPUTS_DOCSTRING)
1059
+ def forward(
1060
+ self,
1061
+ input_ids: torch.LongTensor = None,
1062
+ attention_mask: Optional[torch.Tensor] = None,
1063
+ position_ids: Optional[torch.LongTensor] = None,
1064
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
1065
+ inputs_embeds: Optional[torch.FloatTensor] = None,
1066
+ labels: Optional[torch.LongTensor] = None,
1067
+ use_cache: Optional[bool] = None,
1068
+ output_attentions: Optional[bool] = None,
1069
+ output_hidden_states: Optional[bool] = None,
1070
+ return_dict: Optional[bool] = None,
1071
+ ) -> Union[Tuple, SequenceClassifierOutputWithPast]:
1072
+ r"""
1073
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1074
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
1075
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
1076
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
1077
+ """
1078
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1079
+
1080
+ transformer_outputs = self.model(
1081
+ input_ids,
1082
+ attention_mask=attention_mask,
1083
+ position_ids=position_ids,
1084
+ past_key_values=past_key_values,
1085
+ inputs_embeds=inputs_embeds,
1086
+ use_cache=use_cache,
1087
+ output_attentions=output_attentions,
1088
+ output_hidden_states=output_hidden_states,
1089
+ return_dict=return_dict,
1090
+ )
1091
+ hidden_states = transformer_outputs[0]
1092
+ logits = self.score(hidden_states)
1093
+
1094
+ if input_ids is not None:
1095
+ batch_size = input_ids.shape[0]
1096
+ else:
1097
+ batch_size = inputs_embeds.shape[0]
1098
+
1099
+ if self.config.pad_token_id is None and batch_size != 1:
1100
+ raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
1101
+ if self.config.pad_token_id is None:
1102
+ sequence_lengths = -1
1103
+ else:
1104
+ if input_ids is not None:
1105
+ sequence_lengths = (torch.eq(input_ids, self.config.pad_token_id).long().argmax(-1) - 1).to(
1106
+ logits.device
1107
+ )
1108
+ else:
1109
+ sequence_lengths = -1
1110
+
1111
+ pooled_logits = logits[torch.arange(batch_size, device=logits.device), sequence_lengths]
1112
+
1113
+ loss = None
1114
+ if labels is not None:
1115
+ labels = labels.to(logits.device)
1116
+ if self.config.problem_type is None:
1117
+ if self.num_labels == 1:
1118
+ self.config.problem_type = "regression"
1119
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
1120
+ self.config.problem_type = "single_label_classification"
1121
+ else:
1122
+ self.config.problem_type = "multi_label_classification"
1123
+
1124
+ if self.config.problem_type == "regression":
1125
+ loss_fct = MSELoss()
1126
+ if self.num_labels == 1:
1127
+ loss = loss_fct(pooled_logits.squeeze(), labels.squeeze())
1128
+ else:
1129
+ loss = loss_fct(pooled_logits, labels)
1130
+ elif self.config.problem_type == "single_label_classification":
1131
+ loss_fct = CrossEntropyLoss()
1132
+ loss = loss_fct(pooled_logits.view(-1, self.num_labels), labels.view(-1))
1133
+ elif self.config.problem_type == "multi_label_classification":
1134
+ loss_fct = BCEWithLogitsLoss()
1135
+ loss = loss_fct(pooled_logits, labels)
1136
+ if not return_dict:
1137
+ output = (pooled_logits,) + transformer_outputs[1:]
1138
+ return ((loss,) + output) if loss is not None else output
1139
+
1140
+ return SequenceClassifierOutputWithPast(
1141
+ loss=loss,
1142
+ logits=pooled_logits,
1143
+ past_key_values=transformer_outputs.past_key_values,
1144
+ hidden_states=transformer_outputs.hidden_states,
1145
+ attentions=transformer_outputs.attentions,
1146
+ )
smash_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "api_key": null,
3
+ "verify_url": "http://johnrachwan.pythonanywhere.com",
4
+ "smash_config": {
5
+ "pruners": "None",
6
+ "pruning_ratio": 0.0,
7
+ "factorizers": "None",
8
+ "quantizers": "['llm-int8']",
9
+ "weight_quantization_bits": 8,
10
+ "output_deviation": 0.005,
11
+ "compilers": "None",
12
+ "static_batch": true,
13
+ "static_shape": true,
14
+ "controlnet": "None",
15
+ "unet_dim": 4,
16
+ "device": "cuda",
17
+ "cache_dir": "/ceph/hdd/staff/charpent/.cache/modelsx0wsbj_q",
18
+ "batch_size": 1,
19
+ "model_name": "BAAI/AquilaChat2-7B",
20
+ "task": "text_text_generation",
21
+ "max_batch_size": 1,
22
+ "qtype_weight": "torch.qint8",
23
+ "qtype_activation": "torch.quint8",
24
+ "qobserver": "<class 'torch.ao.quantization.observer.MinMaxObserver'>",
25
+ "qscheme": "torch.per_tensor_symmetric",
26
+ "qconfig": "x86",
27
+ "group_size": 128,
28
+ "damp_percent": 0.1,
29
+ "save_load_fn": "bitsandbytes"
30
+ }
31
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "100000": {
13
+ "content": "<|startofpiece|>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": false
19
+ },
20
+ "100001": {
21
+ "content": "<|endofpiece|>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": false
27
+ },
28
+ "100002": {
29
+ "content": "<|LDWANG|>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": false
35
+ },
36
+ "100003": {
37
+ "content": "[MASK]",
38
+ "lstrip": false,
39
+ "normalized": true,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": false
43
+ },
44
+ "100004": {
45
+ "content": "[gMASK]",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": false
51
+ },
52
+ "100005": {
53
+ "content": "[sMASK]",
54
+ "lstrip": false,
55
+ "normalized": true,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": false
59
+ },
60
+ "100006": {
61
+ "content": "[CLS]",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "100007": {
69
+ "content": "</s>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ }
76
+ },
77
+ "bos_token": "[CLS]",
78
+ "clean_up_tokenization_spaces": true,
79
+ "eos_token": "</s>",
80
+ "legacy": false,
81
+ "model_max_length": 2048,
82
+ "pad_token": "<|endoftext|>",
83
+ "padding_side": "right",
84
+ "tokenizer_class": "GPT2Tokenizer",
85
+ "unk_token": "<|endoftext|>"
86
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff