sshh12 commited on
Commit
973d333
·
1 Parent(s): c9118f7

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ ---
4
+
5
+ ---
6
+ license: apache-2.0
7
+ base_model: mistralai/Mistral-7B-Instruct-v0.1
8
+ dataset: sshh12/imagebind-llava-finetune
9
+ tags:
10
+ - finetuned
11
+ - multimodal
12
+ inference: false
13
+ ---
14
+
15
+ These are weights for a version of `mistralai/Mistral-7B-Instruct-v0.1` finetuned for multimodal applications.
16
+
17
+ ### Modalities
18
+
19
+ * ImageBindModality (use `<imagebind>` in text and provide `imagebinds`, encoded as 4 tokens)
20
+
21
+ ### Dataset
22
+
23
+ sshh12/imagebind-llava-finetune (235163 examples)
24
+
25
+ ```
26
+ {'id': '000000334872', 'imagebinds': ['/data/llava_finetune_data/images/coco/train2017/train2017/000000334872.jpg'], 'messages': [{'content': '<imagebind>\nAre the people in the audio skiing downhill or cross-country skiing?', 'role': 'user'}, {'content': 'The people in the audio are cross-country skiing in the woods, as they are skiing on a trail rather than a steep slope.', 'role': 'assistant'}, {'content': 'How many people are in the audio?', 'role': 'user'}, {'content': 'There are two people in the audio, both on skis in the snow.', 'role': 'assistant'}, {'content': 'What kind of environment are they skiing in?', 'role': 'user'}, {'content': 'They are skiing in a wooded environment, following a trail through the trees while surrounded by snow.', 'role': 'assistant'}, {'content': 'Do the skiers have any additional gear with them besides their skis and poles?', 'role': 'user'}, {'content': 'Yes, the two male skiers are carrying backpacks while they ski through the woods. The backpacks might contain essentials for their skiing adventure, such as food, water, extra clothing, or safety equipment.', 'role': 'assistant'}]}
27
+ ```
28
+
29
+ ### Training Device(s)
30
+
31
+ ```
32
+ name, pci.bus_id, vbios_version
33
+ NVIDIA GeForce RTX 4090, 00000000:03:00.0, 95.02.3C.00.8C
34
+ ```
35
+
36
+ ### Usage
37
+
38
+ GitHub: https://github.com/sshh12/multi_token
39
+
40
+
41
+ ### Model
42
+
43
+ ```
44
+ MistralLMMForCausalLM.model =
45
+
46
+ PeftModelForCausalLM(
47
+ (base_model): LoraModel(
48
+ (model): MistralLMMForCausalLM(
49
+ (model): MistralLMMModel(
50
+ (embed_tokens): Embedding(32000, 4096)
51
+ (layers): ModuleList(
52
+ (0-31): 32 x MistralDecoderLayer(
53
+ (self_attn): MistralAttention(
54
+ (q_proj): Linear(
55
+ in_features=4096, out_features=4096, bias=False
56
+ (lora_dropout): ModuleDict(
57
+ (default): Dropout(p=0.05, inplace=False)
58
+ )
59
+ (lora_A): ModuleDict(
60
+ (default): Linear(in_features=4096, out_features=64, bias=False)
61
+ )
62
+ (lora_B): ModuleDict(
63
+ (default): Linear(in_features=64, out_features=4096, bias=False)
64
+ )
65
+ (lora_embedding_A): ParameterDict()
66
+ (lora_embedding_B): ParameterDict()
67
+ )
68
+ (k_proj): Linear(
69
+ in_features=4096, out_features=1024, bias=False
70
+ (lora_dropout): ModuleDict(
71
+ (default): Dropout(p=0.05, inplace=False)
72
+ )
73
+ (lora_A): ModuleDict(
74
+ (default): Linear(in_features=4096, out_features=64, bias=False)
75
+ )
76
+ (lora_B): ModuleDict(
77
+ (default): Linear(in_features=64, out_features=1024, bias=False)
78
+ )
79
+ (lora_embedding_A): ParameterDict()
80
+ (lora_embedding_B): ParameterDict()
81
+ )
82
+ (v_proj): Linear(
83
+ in_features=4096, out_features=1024, bias=False
84
+ (lora_dropout): ModuleDict(
85
+ (default): Dropout(p=0.05, inplace=False)
86
+ )
87
+ (lora_A): ModuleDict(
88
+ (default): Linear(in_features=4096, out_features=64, bias=False)
89
+ )
90
+ (lora_B): ModuleDict(
91
+ (default): Linear(in_features=64, out_features=1024, bias=False)
92
+ )
93
+ (lora_embedding_A): ParameterDict()
94
+ (lora_embedding_B): ParameterDict()
95
+ )
96
+ (o_proj): Linear(
97
+ in_features=4096, out_features=4096, bias=False
98
+ (lora_dropout): ModuleDict(
99
+ (default): Dropout(p=0.05, inplace=False)
100
+ )
101
+ (lora_A): ModuleDict(
102
+ (default): Linear(in_features=4096, out_features=64, bias=False)
103
+ )
104
+ (lora_B): ModuleDict(
105
+ (default): Linear(in_features=64, out_features=4096, bias=False)
106
+ )
107
+ (lora_embedding_A): ParameterDict()
108
+ (lora_embedding_B): ParameterDict()
109
+ )
110
+ (rotary_emb): MistralRotaryEmbedding()
111
+ )
112
+ (mlp): MistralMLP(
113
+ (gate_proj): Linear(
114
+ in_features=4096, out_features=14336, bias=False
115
+ (lora_dropout): ModuleDict(
116
+ (default): Dropout(p=0.05, inplace=False)
117
+ )
118
+ (lora_A): ModuleDict(
119
+ (default): Linear(in_features=4096, out_features=64, bias=False)
120
+ )
121
+ (lora_B): ModuleDict(
122
+ (default): Linear(in_features=64, out_features=14336, bias=False)
123
+ )
124
+ (lora_embedding_A): ParameterDict()
125
+ (lora_embedding_B): ParameterDict()
126
+ )
127
+ (up_proj): Linear(
128
+ in_features=4096, out_features=14336, bias=False
129
+ (lora_dropout): ModuleDict(
130
+ (default): Dropout(p=0.05, inplace=False)
131
+ )
132
+ (lora_A): ModuleDict(
133
+ (default): Linear(in_features=4096, out_features=64, bias=False)
134
+ )
135
+ (lora_B): ModuleDict(
136
+ (default): Linear(in_features=64, out_features=14336, bias=False)
137
+ )
138
+ (lora_embedding_A): ParameterDict()
139
+ (lora_embedding_B): ParameterDict()
140
+ )
141
+ (down_proj): Linear(
142
+ in_features=14336, out_features=4096, bias=False
143
+ (lora_dropout): ModuleDict(
144
+ (default): Dropout(p=0.05, inplace=False)
145
+ )
146
+ (lora_A): ModuleDict(
147
+ (default): Linear(in_features=14336, out_features=64, bias=False)
148
+ )
149
+ (lora_B): ModuleDict(
150
+ (default): Linear(in_features=64, out_features=4096, bias=False)
151
+ )
152
+ (lora_embedding_A): ParameterDict()
153
+ (lora_embedding_B): ParameterDict()
154
+ )
155
+ (act_fn): SiLUActivation()
156
+ )
157
+ (input_layernorm): MistralRMSNorm()
158
+ (post_attention_layernorm): MistralRMSNorm()
159
+ )
160
+ )
161
+ (norm): MistralRMSNorm()
162
+ (imagebind_lmm_projector): _MLPVectorProjector(
163
+ (mlps): ModuleList(
164
+ (0-3): 4 x Sequential(
165
+ (0): Linear(in_features=1024, out_features=4096, bias=True)
166
+ (1): GELU(approximate='none')
167
+ (2): Linear(in_features=4096, out_features=4096, bias=True)
168
+ )
169
+ )
170
+ )
171
+ )
172
+ (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
173
+ )
174
+ )
175
+ )
176
+ ```
177
+
178
+ ## Training procedure
179
+
180
+ ### Framework versions
181
+
182
+
183
+ - PEFT 0.5.0
adapter_config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_mapping": null,
3
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.1",
4
+ "bias": "none",
5
+ "fan_in_fan_out": false,
6
+ "inference_mode": true,
7
+ "init_lora_weights": true,
8
+ "layers_pattern": null,
9
+ "layers_to_transform": null,
10
+ "lora_alpha": 16,
11
+ "lora_dropout": 0.05,
12
+ "modules_to_save": null,
13
+ "peft_type": "LORA",
14
+ "r": 64,
15
+ "revision": null,
16
+ "target_modules": [
17
+ "up_proj",
18
+ "q_proj",
19
+ "o_proj",
20
+ "gate_proj",
21
+ "down_proj",
22
+ "k_proj",
23
+ "v_proj"
24
+ ],
25
+ "task_type": "CAUSAL_LM"
26
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d3455f2aa67f168bc470bc324e3ae42493e5a9bf4006cb76b5f95bdc75f84e3
3
+ size 335699597
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "mistralai/Mistral-7B-Instruct-v0.1",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 4096,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 14336,
12
+ "max_position_embeddings": 32768,
13
+ "modalities": [
14
+ "imagebind"
15
+ ],
16
+ "modality_builder": "imagebind",
17
+ "model_cls": "MistralLMMForCausalLM",
18
+ "model_type": "mistral-lmm",
19
+ "num_attention_heads": 32,
20
+ "num_hidden_layers": 32,
21
+ "num_key_value_heads": 8,
22
+ "rms_norm_eps": 1e-05,
23
+ "rope_theta": 10000.0,
24
+ "sliding_window": 4096,
25
+ "tie_word_embeddings": false,
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.34.1",
28
+ "use_cache": true,
29
+ "vocab_size": 32000
30
+ }
model_named_parameters.txt ADDED
@@ -0,0 +1,755 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ base_model.model.model.embed_tokens.weight torch.Size([32000, 4096]) False
2
+ base_model.model.model.layers.0.self_attn.q_proj.weight torch.Size([4096, 4096]) False
3
+ base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
4
+ base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
5
+ base_model.model.model.layers.0.self_attn.k_proj.weight torch.Size([1024, 4096]) False
6
+ base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
7
+ base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
8
+ base_model.model.model.layers.0.self_attn.v_proj.weight torch.Size([1024, 4096]) False
9
+ base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
10
+ base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
11
+ base_model.model.model.layers.0.self_attn.o_proj.weight torch.Size([4096, 4096]) False
12
+ base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
13
+ base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
14
+ base_model.model.model.layers.0.mlp.gate_proj.weight torch.Size([14336, 4096]) False
15
+ base_model.model.model.layers.0.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
16
+ base_model.model.model.layers.0.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
17
+ base_model.model.model.layers.0.mlp.up_proj.weight torch.Size([14336, 4096]) False
18
+ base_model.model.model.layers.0.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
19
+ base_model.model.model.layers.0.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
20
+ base_model.model.model.layers.0.mlp.down_proj.weight torch.Size([4096, 14336]) False
21
+ base_model.model.model.layers.0.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
22
+ base_model.model.model.layers.0.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
23
+ base_model.model.model.layers.0.input_layernorm.weight torch.Size([4096]) False
24
+ base_model.model.model.layers.0.post_attention_layernorm.weight torch.Size([4096]) False
25
+ base_model.model.model.layers.1.self_attn.q_proj.weight torch.Size([4096, 4096]) False
26
+ base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
27
+ base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
28
+ base_model.model.model.layers.1.self_attn.k_proj.weight torch.Size([1024, 4096]) False
29
+ base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
30
+ base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
31
+ base_model.model.model.layers.1.self_attn.v_proj.weight torch.Size([1024, 4096]) False
32
+ base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
33
+ base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
34
+ base_model.model.model.layers.1.self_attn.o_proj.weight torch.Size([4096, 4096]) False
35
+ base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
36
+ base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
37
+ base_model.model.model.layers.1.mlp.gate_proj.weight torch.Size([14336, 4096]) False
38
+ base_model.model.model.layers.1.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
39
+ base_model.model.model.layers.1.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
40
+ base_model.model.model.layers.1.mlp.up_proj.weight torch.Size([14336, 4096]) False
41
+ base_model.model.model.layers.1.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
42
+ base_model.model.model.layers.1.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
43
+ base_model.model.model.layers.1.mlp.down_proj.weight torch.Size([4096, 14336]) False
44
+ base_model.model.model.layers.1.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
45
+ base_model.model.model.layers.1.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
46
+ base_model.model.model.layers.1.input_layernorm.weight torch.Size([4096]) False
47
+ base_model.model.model.layers.1.post_attention_layernorm.weight torch.Size([4096]) False
48
+ base_model.model.model.layers.2.self_attn.q_proj.weight torch.Size([4096, 4096]) False
49
+ base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
50
+ base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
51
+ base_model.model.model.layers.2.self_attn.k_proj.weight torch.Size([1024, 4096]) False
52
+ base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
53
+ base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
54
+ base_model.model.model.layers.2.self_attn.v_proj.weight torch.Size([1024, 4096]) False
55
+ base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
56
+ base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
57
+ base_model.model.model.layers.2.self_attn.o_proj.weight torch.Size([4096, 4096]) False
58
+ base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
59
+ base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
60
+ base_model.model.model.layers.2.mlp.gate_proj.weight torch.Size([14336, 4096]) False
61
+ base_model.model.model.layers.2.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
62
+ base_model.model.model.layers.2.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
63
+ base_model.model.model.layers.2.mlp.up_proj.weight torch.Size([14336, 4096]) False
64
+ base_model.model.model.layers.2.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
65
+ base_model.model.model.layers.2.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
66
+ base_model.model.model.layers.2.mlp.down_proj.weight torch.Size([4096, 14336]) False
67
+ base_model.model.model.layers.2.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
68
+ base_model.model.model.layers.2.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
69
+ base_model.model.model.layers.2.input_layernorm.weight torch.Size([4096]) False
70
+ base_model.model.model.layers.2.post_attention_layernorm.weight torch.Size([4096]) False
71
+ base_model.model.model.layers.3.self_attn.q_proj.weight torch.Size([4096, 4096]) False
72
+ base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
73
+ base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
74
+ base_model.model.model.layers.3.self_attn.k_proj.weight torch.Size([1024, 4096]) False
75
+ base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
76
+ base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
77
+ base_model.model.model.layers.3.self_attn.v_proj.weight torch.Size([1024, 4096]) False
78
+ base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
79
+ base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
80
+ base_model.model.model.layers.3.self_attn.o_proj.weight torch.Size([4096, 4096]) False
81
+ base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
82
+ base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
83
+ base_model.model.model.layers.3.mlp.gate_proj.weight torch.Size([14336, 4096]) False
84
+ base_model.model.model.layers.3.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
85
+ base_model.model.model.layers.3.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
86
+ base_model.model.model.layers.3.mlp.up_proj.weight torch.Size([14336, 4096]) False
87
+ base_model.model.model.layers.3.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
88
+ base_model.model.model.layers.3.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
89
+ base_model.model.model.layers.3.mlp.down_proj.weight torch.Size([4096, 14336]) False
90
+ base_model.model.model.layers.3.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
91
+ base_model.model.model.layers.3.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
92
+ base_model.model.model.layers.3.input_layernorm.weight torch.Size([4096]) False
93
+ base_model.model.model.layers.3.post_attention_layernorm.weight torch.Size([4096]) False
94
+ base_model.model.model.layers.4.self_attn.q_proj.weight torch.Size([4096, 4096]) False
95
+ base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
96
+ base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
97
+ base_model.model.model.layers.4.self_attn.k_proj.weight torch.Size([1024, 4096]) False
98
+ base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
99
+ base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
100
+ base_model.model.model.layers.4.self_attn.v_proj.weight torch.Size([1024, 4096]) False
101
+ base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
102
+ base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
103
+ base_model.model.model.layers.4.self_attn.o_proj.weight torch.Size([4096, 4096]) False
104
+ base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
105
+ base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
106
+ base_model.model.model.layers.4.mlp.gate_proj.weight torch.Size([14336, 4096]) False
107
+ base_model.model.model.layers.4.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
108
+ base_model.model.model.layers.4.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
109
+ base_model.model.model.layers.4.mlp.up_proj.weight torch.Size([14336, 4096]) False
110
+ base_model.model.model.layers.4.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
111
+ base_model.model.model.layers.4.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
112
+ base_model.model.model.layers.4.mlp.down_proj.weight torch.Size([4096, 14336]) False
113
+ base_model.model.model.layers.4.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
114
+ base_model.model.model.layers.4.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
115
+ base_model.model.model.layers.4.input_layernorm.weight torch.Size([4096]) False
116
+ base_model.model.model.layers.4.post_attention_layernorm.weight torch.Size([4096]) False
117
+ base_model.model.model.layers.5.self_attn.q_proj.weight torch.Size([4096, 4096]) False
118
+ base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
119
+ base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
120
+ base_model.model.model.layers.5.self_attn.k_proj.weight torch.Size([1024, 4096]) False
121
+ base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
122
+ base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
123
+ base_model.model.model.layers.5.self_attn.v_proj.weight torch.Size([1024, 4096]) False
124
+ base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
125
+ base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
126
+ base_model.model.model.layers.5.self_attn.o_proj.weight torch.Size([4096, 4096]) False
127
+ base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
128
+ base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
129
+ base_model.model.model.layers.5.mlp.gate_proj.weight torch.Size([14336, 4096]) False
130
+ base_model.model.model.layers.5.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
131
+ base_model.model.model.layers.5.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
132
+ base_model.model.model.layers.5.mlp.up_proj.weight torch.Size([14336, 4096]) False
133
+ base_model.model.model.layers.5.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
134
+ base_model.model.model.layers.5.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
135
+ base_model.model.model.layers.5.mlp.down_proj.weight torch.Size([4096, 14336]) False
136
+ base_model.model.model.layers.5.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
137
+ base_model.model.model.layers.5.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
138
+ base_model.model.model.layers.5.input_layernorm.weight torch.Size([4096]) False
139
+ base_model.model.model.layers.5.post_attention_layernorm.weight torch.Size([4096]) False
140
+ base_model.model.model.layers.6.self_attn.q_proj.weight torch.Size([4096, 4096]) False
141
+ base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
142
+ base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
143
+ base_model.model.model.layers.6.self_attn.k_proj.weight torch.Size([1024, 4096]) False
144
+ base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
145
+ base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
146
+ base_model.model.model.layers.6.self_attn.v_proj.weight torch.Size([1024, 4096]) False
147
+ base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
148
+ base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
149
+ base_model.model.model.layers.6.self_attn.o_proj.weight torch.Size([4096, 4096]) False
150
+ base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
151
+ base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
152
+ base_model.model.model.layers.6.mlp.gate_proj.weight torch.Size([14336, 4096]) False
153
+ base_model.model.model.layers.6.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
154
+ base_model.model.model.layers.6.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
155
+ base_model.model.model.layers.6.mlp.up_proj.weight torch.Size([14336, 4096]) False
156
+ base_model.model.model.layers.6.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
157
+ base_model.model.model.layers.6.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
158
+ base_model.model.model.layers.6.mlp.down_proj.weight torch.Size([4096, 14336]) False
159
+ base_model.model.model.layers.6.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
160
+ base_model.model.model.layers.6.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
161
+ base_model.model.model.layers.6.input_layernorm.weight torch.Size([4096]) False
162
+ base_model.model.model.layers.6.post_attention_layernorm.weight torch.Size([4096]) False
163
+ base_model.model.model.layers.7.self_attn.q_proj.weight torch.Size([4096, 4096]) False
164
+ base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
165
+ base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
166
+ base_model.model.model.layers.7.self_attn.k_proj.weight torch.Size([1024, 4096]) False
167
+ base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
168
+ base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
169
+ base_model.model.model.layers.7.self_attn.v_proj.weight torch.Size([1024, 4096]) False
170
+ base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
171
+ base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
172
+ base_model.model.model.layers.7.self_attn.o_proj.weight torch.Size([4096, 4096]) False
173
+ base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
174
+ base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
175
+ base_model.model.model.layers.7.mlp.gate_proj.weight torch.Size([14336, 4096]) False
176
+ base_model.model.model.layers.7.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
177
+ base_model.model.model.layers.7.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
178
+ base_model.model.model.layers.7.mlp.up_proj.weight torch.Size([14336, 4096]) False
179
+ base_model.model.model.layers.7.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
180
+ base_model.model.model.layers.7.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
181
+ base_model.model.model.layers.7.mlp.down_proj.weight torch.Size([4096, 14336]) False
182
+ base_model.model.model.layers.7.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
183
+ base_model.model.model.layers.7.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
184
+ base_model.model.model.layers.7.input_layernorm.weight torch.Size([4096]) False
185
+ base_model.model.model.layers.7.post_attention_layernorm.weight torch.Size([4096]) False
186
+ base_model.model.model.layers.8.self_attn.q_proj.weight torch.Size([4096, 4096]) False
187
+ base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
188
+ base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
189
+ base_model.model.model.layers.8.self_attn.k_proj.weight torch.Size([1024, 4096]) False
190
+ base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
191
+ base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
192
+ base_model.model.model.layers.8.self_attn.v_proj.weight torch.Size([1024, 4096]) False
193
+ base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
194
+ base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
195
+ base_model.model.model.layers.8.self_attn.o_proj.weight torch.Size([4096, 4096]) False
196
+ base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
197
+ base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
198
+ base_model.model.model.layers.8.mlp.gate_proj.weight torch.Size([14336, 4096]) False
199
+ base_model.model.model.layers.8.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
200
+ base_model.model.model.layers.8.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
201
+ base_model.model.model.layers.8.mlp.up_proj.weight torch.Size([14336, 4096]) False
202
+ base_model.model.model.layers.8.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
203
+ base_model.model.model.layers.8.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
204
+ base_model.model.model.layers.8.mlp.down_proj.weight torch.Size([4096, 14336]) False
205
+ base_model.model.model.layers.8.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
206
+ base_model.model.model.layers.8.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
207
+ base_model.model.model.layers.8.input_layernorm.weight torch.Size([4096]) False
208
+ base_model.model.model.layers.8.post_attention_layernorm.weight torch.Size([4096]) False
209
+ base_model.model.model.layers.9.self_attn.q_proj.weight torch.Size([4096, 4096]) False
210
+ base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
211
+ base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
212
+ base_model.model.model.layers.9.self_attn.k_proj.weight torch.Size([1024, 4096]) False
213
+ base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
214
+ base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
215
+ base_model.model.model.layers.9.self_attn.v_proj.weight torch.Size([1024, 4096]) False
216
+ base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
217
+ base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
218
+ base_model.model.model.layers.9.self_attn.o_proj.weight torch.Size([4096, 4096]) False
219
+ base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
220
+ base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
221
+ base_model.model.model.layers.9.mlp.gate_proj.weight torch.Size([14336, 4096]) False
222
+ base_model.model.model.layers.9.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
223
+ base_model.model.model.layers.9.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
224
+ base_model.model.model.layers.9.mlp.up_proj.weight torch.Size([14336, 4096]) False
225
+ base_model.model.model.layers.9.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
226
+ base_model.model.model.layers.9.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
227
+ base_model.model.model.layers.9.mlp.down_proj.weight torch.Size([4096, 14336]) False
228
+ base_model.model.model.layers.9.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
229
+ base_model.model.model.layers.9.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
230
+ base_model.model.model.layers.9.input_layernorm.weight torch.Size([4096]) False
231
+ base_model.model.model.layers.9.post_attention_layernorm.weight torch.Size([4096]) False
232
+ base_model.model.model.layers.10.self_attn.q_proj.weight torch.Size([4096, 4096]) False
233
+ base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
234
+ base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
235
+ base_model.model.model.layers.10.self_attn.k_proj.weight torch.Size([1024, 4096]) False
236
+ base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
237
+ base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
238
+ base_model.model.model.layers.10.self_attn.v_proj.weight torch.Size([1024, 4096]) False
239
+ base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
240
+ base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
241
+ base_model.model.model.layers.10.self_attn.o_proj.weight torch.Size([4096, 4096]) False
242
+ base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
243
+ base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
244
+ base_model.model.model.layers.10.mlp.gate_proj.weight torch.Size([14336, 4096]) False
245
+ base_model.model.model.layers.10.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
246
+ base_model.model.model.layers.10.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
247
+ base_model.model.model.layers.10.mlp.up_proj.weight torch.Size([14336, 4096]) False
248
+ base_model.model.model.layers.10.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
249
+ base_model.model.model.layers.10.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
250
+ base_model.model.model.layers.10.mlp.down_proj.weight torch.Size([4096, 14336]) False
251
+ base_model.model.model.layers.10.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
252
+ base_model.model.model.layers.10.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
253
+ base_model.model.model.layers.10.input_layernorm.weight torch.Size([4096]) False
254
+ base_model.model.model.layers.10.post_attention_layernorm.weight torch.Size([4096]) False
255
+ base_model.model.model.layers.11.self_attn.q_proj.weight torch.Size([4096, 4096]) False
256
+ base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
257
+ base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
258
+ base_model.model.model.layers.11.self_attn.k_proj.weight torch.Size([1024, 4096]) False
259
+ base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
260
+ base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
261
+ base_model.model.model.layers.11.self_attn.v_proj.weight torch.Size([1024, 4096]) False
262
+ base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
263
+ base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
264
+ base_model.model.model.layers.11.self_attn.o_proj.weight torch.Size([4096, 4096]) False
265
+ base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
266
+ base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
267
+ base_model.model.model.layers.11.mlp.gate_proj.weight torch.Size([14336, 4096]) False
268
+ base_model.model.model.layers.11.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
269
+ base_model.model.model.layers.11.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
270
+ base_model.model.model.layers.11.mlp.up_proj.weight torch.Size([14336, 4096]) False
271
+ base_model.model.model.layers.11.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
272
+ base_model.model.model.layers.11.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
273
+ base_model.model.model.layers.11.mlp.down_proj.weight torch.Size([4096, 14336]) False
274
+ base_model.model.model.layers.11.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
275
+ base_model.model.model.layers.11.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
276
+ base_model.model.model.layers.11.input_layernorm.weight torch.Size([4096]) False
277
+ base_model.model.model.layers.11.post_attention_layernorm.weight torch.Size([4096]) False
278
+ base_model.model.model.layers.12.self_attn.q_proj.weight torch.Size([4096, 4096]) False
279
+ base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
280
+ base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
281
+ base_model.model.model.layers.12.self_attn.k_proj.weight torch.Size([1024, 4096]) False
282
+ base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
283
+ base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
284
+ base_model.model.model.layers.12.self_attn.v_proj.weight torch.Size([1024, 4096]) False
285
+ base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
286
+ base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
287
+ base_model.model.model.layers.12.self_attn.o_proj.weight torch.Size([4096, 4096]) False
288
+ base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
289
+ base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
290
+ base_model.model.model.layers.12.mlp.gate_proj.weight torch.Size([14336, 4096]) False
291
+ base_model.model.model.layers.12.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
292
+ base_model.model.model.layers.12.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
293
+ base_model.model.model.layers.12.mlp.up_proj.weight torch.Size([14336, 4096]) False
294
+ base_model.model.model.layers.12.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
295
+ base_model.model.model.layers.12.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
296
+ base_model.model.model.layers.12.mlp.down_proj.weight torch.Size([4096, 14336]) False
297
+ base_model.model.model.layers.12.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
298
+ base_model.model.model.layers.12.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
299
+ base_model.model.model.layers.12.input_layernorm.weight torch.Size([4096]) False
300
+ base_model.model.model.layers.12.post_attention_layernorm.weight torch.Size([4096]) False
301
+ base_model.model.model.layers.13.self_attn.q_proj.weight torch.Size([4096, 4096]) False
302
+ base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
303
+ base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
304
+ base_model.model.model.layers.13.self_attn.k_proj.weight torch.Size([1024, 4096]) False
305
+ base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
306
+ base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
307
+ base_model.model.model.layers.13.self_attn.v_proj.weight torch.Size([1024, 4096]) False
308
+ base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
309
+ base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
310
+ base_model.model.model.layers.13.self_attn.o_proj.weight torch.Size([4096, 4096]) False
311
+ base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
312
+ base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
313
+ base_model.model.model.layers.13.mlp.gate_proj.weight torch.Size([14336, 4096]) False
314
+ base_model.model.model.layers.13.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
315
+ base_model.model.model.layers.13.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
316
+ base_model.model.model.layers.13.mlp.up_proj.weight torch.Size([14336, 4096]) False
317
+ base_model.model.model.layers.13.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
318
+ base_model.model.model.layers.13.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
319
+ base_model.model.model.layers.13.mlp.down_proj.weight torch.Size([4096, 14336]) False
320
+ base_model.model.model.layers.13.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
321
+ base_model.model.model.layers.13.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
322
+ base_model.model.model.layers.13.input_layernorm.weight torch.Size([4096]) False
323
+ base_model.model.model.layers.13.post_attention_layernorm.weight torch.Size([4096]) False
324
+ base_model.model.model.layers.14.self_attn.q_proj.weight torch.Size([4096, 4096]) False
325
+ base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
326
+ base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
327
+ base_model.model.model.layers.14.self_attn.k_proj.weight torch.Size([1024, 4096]) False
328
+ base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
329
+ base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
330
+ base_model.model.model.layers.14.self_attn.v_proj.weight torch.Size([1024, 4096]) False
331
+ base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
332
+ base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
333
+ base_model.model.model.layers.14.self_attn.o_proj.weight torch.Size([4096, 4096]) False
334
+ base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
335
+ base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
336
+ base_model.model.model.layers.14.mlp.gate_proj.weight torch.Size([14336, 4096]) False
337
+ base_model.model.model.layers.14.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
338
+ base_model.model.model.layers.14.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
339
+ base_model.model.model.layers.14.mlp.up_proj.weight torch.Size([14336, 4096]) False
340
+ base_model.model.model.layers.14.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
341
+ base_model.model.model.layers.14.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
342
+ base_model.model.model.layers.14.mlp.down_proj.weight torch.Size([4096, 14336]) False
343
+ base_model.model.model.layers.14.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
344
+ base_model.model.model.layers.14.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
345
+ base_model.model.model.layers.14.input_layernorm.weight torch.Size([4096]) False
346
+ base_model.model.model.layers.14.post_attention_layernorm.weight torch.Size([4096]) False
347
+ base_model.model.model.layers.15.self_attn.q_proj.weight torch.Size([4096, 4096]) False
348
+ base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
349
+ base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
350
+ base_model.model.model.layers.15.self_attn.k_proj.weight torch.Size([1024, 4096]) False
351
+ base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
352
+ base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
353
+ base_model.model.model.layers.15.self_attn.v_proj.weight torch.Size([1024, 4096]) False
354
+ base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
355
+ base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
356
+ base_model.model.model.layers.15.self_attn.o_proj.weight torch.Size([4096, 4096]) False
357
+ base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
358
+ base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
359
+ base_model.model.model.layers.15.mlp.gate_proj.weight torch.Size([14336, 4096]) False
360
+ base_model.model.model.layers.15.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
361
+ base_model.model.model.layers.15.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
362
+ base_model.model.model.layers.15.mlp.up_proj.weight torch.Size([14336, 4096]) False
363
+ base_model.model.model.layers.15.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
364
+ base_model.model.model.layers.15.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
365
+ base_model.model.model.layers.15.mlp.down_proj.weight torch.Size([4096, 14336]) False
366
+ base_model.model.model.layers.15.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
367
+ base_model.model.model.layers.15.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
368
+ base_model.model.model.layers.15.input_layernorm.weight torch.Size([4096]) False
369
+ base_model.model.model.layers.15.post_attention_layernorm.weight torch.Size([4096]) False
370
+ base_model.model.model.layers.16.self_attn.q_proj.weight torch.Size([4096, 4096]) False
371
+ base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
372
+ base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
373
+ base_model.model.model.layers.16.self_attn.k_proj.weight torch.Size([1024, 4096]) False
374
+ base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
375
+ base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
376
+ base_model.model.model.layers.16.self_attn.v_proj.weight torch.Size([1024, 4096]) False
377
+ base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
378
+ base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
379
+ base_model.model.model.layers.16.self_attn.o_proj.weight torch.Size([4096, 4096]) False
380
+ base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
381
+ base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
382
+ base_model.model.model.layers.16.mlp.gate_proj.weight torch.Size([14336, 4096]) False
383
+ base_model.model.model.layers.16.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
384
+ base_model.model.model.layers.16.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
385
+ base_model.model.model.layers.16.mlp.up_proj.weight torch.Size([14336, 4096]) False
386
+ base_model.model.model.layers.16.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
387
+ base_model.model.model.layers.16.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
388
+ base_model.model.model.layers.16.mlp.down_proj.weight torch.Size([4096, 14336]) False
389
+ base_model.model.model.layers.16.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
390
+ base_model.model.model.layers.16.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
391
+ base_model.model.model.layers.16.input_layernorm.weight torch.Size([4096]) False
392
+ base_model.model.model.layers.16.post_attention_layernorm.weight torch.Size([4096]) False
393
+ base_model.model.model.layers.17.self_attn.q_proj.weight torch.Size([4096, 4096]) False
394
+ base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
395
+ base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
396
+ base_model.model.model.layers.17.self_attn.k_proj.weight torch.Size([1024, 4096]) False
397
+ base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
398
+ base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
399
+ base_model.model.model.layers.17.self_attn.v_proj.weight torch.Size([1024, 4096]) False
400
+ base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
401
+ base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
402
+ base_model.model.model.layers.17.self_attn.o_proj.weight torch.Size([4096, 4096]) False
403
+ base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
404
+ base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
405
+ base_model.model.model.layers.17.mlp.gate_proj.weight torch.Size([14336, 4096]) False
406
+ base_model.model.model.layers.17.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
407
+ base_model.model.model.layers.17.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
408
+ base_model.model.model.layers.17.mlp.up_proj.weight torch.Size([14336, 4096]) False
409
+ base_model.model.model.layers.17.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
410
+ base_model.model.model.layers.17.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
411
+ base_model.model.model.layers.17.mlp.down_proj.weight torch.Size([4096, 14336]) False
412
+ base_model.model.model.layers.17.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
413
+ base_model.model.model.layers.17.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
414
+ base_model.model.model.layers.17.input_layernorm.weight torch.Size([4096]) False
415
+ base_model.model.model.layers.17.post_attention_layernorm.weight torch.Size([4096]) False
416
+ base_model.model.model.layers.18.self_attn.q_proj.weight torch.Size([4096, 4096]) False
417
+ base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
418
+ base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
419
+ base_model.model.model.layers.18.self_attn.k_proj.weight torch.Size([1024, 4096]) False
420
+ base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
421
+ base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
422
+ base_model.model.model.layers.18.self_attn.v_proj.weight torch.Size([1024, 4096]) False
423
+ base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
424
+ base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
425
+ base_model.model.model.layers.18.self_attn.o_proj.weight torch.Size([4096, 4096]) False
426
+ base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
427
+ base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
428
+ base_model.model.model.layers.18.mlp.gate_proj.weight torch.Size([14336, 4096]) False
429
+ base_model.model.model.layers.18.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
430
+ base_model.model.model.layers.18.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
431
+ base_model.model.model.layers.18.mlp.up_proj.weight torch.Size([14336, 4096]) False
432
+ base_model.model.model.layers.18.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
433
+ base_model.model.model.layers.18.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
434
+ base_model.model.model.layers.18.mlp.down_proj.weight torch.Size([4096, 14336]) False
435
+ base_model.model.model.layers.18.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
436
+ base_model.model.model.layers.18.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
437
+ base_model.model.model.layers.18.input_layernorm.weight torch.Size([4096]) False
438
+ base_model.model.model.layers.18.post_attention_layernorm.weight torch.Size([4096]) False
439
+ base_model.model.model.layers.19.self_attn.q_proj.weight torch.Size([4096, 4096]) False
440
+ base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
441
+ base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
442
+ base_model.model.model.layers.19.self_attn.k_proj.weight torch.Size([1024, 4096]) False
443
+ base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
444
+ base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
445
+ base_model.model.model.layers.19.self_attn.v_proj.weight torch.Size([1024, 4096]) False
446
+ base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
447
+ base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
448
+ base_model.model.model.layers.19.self_attn.o_proj.weight torch.Size([4096, 4096]) False
449
+ base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
450
+ base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
451
+ base_model.model.model.layers.19.mlp.gate_proj.weight torch.Size([14336, 4096]) False
452
+ base_model.model.model.layers.19.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
453
+ base_model.model.model.layers.19.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
454
+ base_model.model.model.layers.19.mlp.up_proj.weight torch.Size([14336, 4096]) False
455
+ base_model.model.model.layers.19.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
456
+ base_model.model.model.layers.19.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
457
+ base_model.model.model.layers.19.mlp.down_proj.weight torch.Size([4096, 14336]) False
458
+ base_model.model.model.layers.19.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
459
+ base_model.model.model.layers.19.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
460
+ base_model.model.model.layers.19.input_layernorm.weight torch.Size([4096]) False
461
+ base_model.model.model.layers.19.post_attention_layernorm.weight torch.Size([4096]) False
462
+ base_model.model.model.layers.20.self_attn.q_proj.weight torch.Size([4096, 4096]) False
463
+ base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
464
+ base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
465
+ base_model.model.model.layers.20.self_attn.k_proj.weight torch.Size([1024, 4096]) False
466
+ base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
467
+ base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
468
+ base_model.model.model.layers.20.self_attn.v_proj.weight torch.Size([1024, 4096]) False
469
+ base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
470
+ base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
471
+ base_model.model.model.layers.20.self_attn.o_proj.weight torch.Size([4096, 4096]) False
472
+ base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
473
+ base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
474
+ base_model.model.model.layers.20.mlp.gate_proj.weight torch.Size([14336, 4096]) False
475
+ base_model.model.model.layers.20.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
476
+ base_model.model.model.layers.20.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
477
+ base_model.model.model.layers.20.mlp.up_proj.weight torch.Size([14336, 4096]) False
478
+ base_model.model.model.layers.20.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
479
+ base_model.model.model.layers.20.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
480
+ base_model.model.model.layers.20.mlp.down_proj.weight torch.Size([4096, 14336]) False
481
+ base_model.model.model.layers.20.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
482
+ base_model.model.model.layers.20.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
483
+ base_model.model.model.layers.20.input_layernorm.weight torch.Size([4096]) False
484
+ base_model.model.model.layers.20.post_attention_layernorm.weight torch.Size([4096]) False
485
+ base_model.model.model.layers.21.self_attn.q_proj.weight torch.Size([4096, 4096]) False
486
+ base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
487
+ base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
488
+ base_model.model.model.layers.21.self_attn.k_proj.weight torch.Size([1024, 4096]) False
489
+ base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
490
+ base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
491
+ base_model.model.model.layers.21.self_attn.v_proj.weight torch.Size([1024, 4096]) False
492
+ base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
493
+ base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
494
+ base_model.model.model.layers.21.self_attn.o_proj.weight torch.Size([4096, 4096]) False
495
+ base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
496
+ base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
497
+ base_model.model.model.layers.21.mlp.gate_proj.weight torch.Size([14336, 4096]) False
498
+ base_model.model.model.layers.21.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
499
+ base_model.model.model.layers.21.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
500
+ base_model.model.model.layers.21.mlp.up_proj.weight torch.Size([14336, 4096]) False
501
+ base_model.model.model.layers.21.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
502
+ base_model.model.model.layers.21.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
503
+ base_model.model.model.layers.21.mlp.down_proj.weight torch.Size([4096, 14336]) False
504
+ base_model.model.model.layers.21.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
505
+ base_model.model.model.layers.21.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
506
+ base_model.model.model.layers.21.input_layernorm.weight torch.Size([4096]) False
507
+ base_model.model.model.layers.21.post_attention_layernorm.weight torch.Size([4096]) False
508
+ base_model.model.model.layers.22.self_attn.q_proj.weight torch.Size([4096, 4096]) False
509
+ base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
510
+ base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
511
+ base_model.model.model.layers.22.self_attn.k_proj.weight torch.Size([1024, 4096]) False
512
+ base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
513
+ base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
514
+ base_model.model.model.layers.22.self_attn.v_proj.weight torch.Size([1024, 4096]) False
515
+ base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
516
+ base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
517
+ base_model.model.model.layers.22.self_attn.o_proj.weight torch.Size([4096, 4096]) False
518
+ base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
519
+ base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
520
+ base_model.model.model.layers.22.mlp.gate_proj.weight torch.Size([14336, 4096]) False
521
+ base_model.model.model.layers.22.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
522
+ base_model.model.model.layers.22.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
523
+ base_model.model.model.layers.22.mlp.up_proj.weight torch.Size([14336, 4096]) False
524
+ base_model.model.model.layers.22.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
525
+ base_model.model.model.layers.22.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
526
+ base_model.model.model.layers.22.mlp.down_proj.weight torch.Size([4096, 14336]) False
527
+ base_model.model.model.layers.22.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
528
+ base_model.model.model.layers.22.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
529
+ base_model.model.model.layers.22.input_layernorm.weight torch.Size([4096]) False
530
+ base_model.model.model.layers.22.post_attention_layernorm.weight torch.Size([4096]) False
531
+ base_model.model.model.layers.23.self_attn.q_proj.weight torch.Size([4096, 4096]) False
532
+ base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
533
+ base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
534
+ base_model.model.model.layers.23.self_attn.k_proj.weight torch.Size([1024, 4096]) False
535
+ base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
536
+ base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
537
+ base_model.model.model.layers.23.self_attn.v_proj.weight torch.Size([1024, 4096]) False
538
+ base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
539
+ base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
540
+ base_model.model.model.layers.23.self_attn.o_proj.weight torch.Size([4096, 4096]) False
541
+ base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
542
+ base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
543
+ base_model.model.model.layers.23.mlp.gate_proj.weight torch.Size([14336, 4096]) False
544
+ base_model.model.model.layers.23.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
545
+ base_model.model.model.layers.23.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
546
+ base_model.model.model.layers.23.mlp.up_proj.weight torch.Size([14336, 4096]) False
547
+ base_model.model.model.layers.23.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
548
+ base_model.model.model.layers.23.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
549
+ base_model.model.model.layers.23.mlp.down_proj.weight torch.Size([4096, 14336]) False
550
+ base_model.model.model.layers.23.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
551
+ base_model.model.model.layers.23.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
552
+ base_model.model.model.layers.23.input_layernorm.weight torch.Size([4096]) False
553
+ base_model.model.model.layers.23.post_attention_layernorm.weight torch.Size([4096]) False
554
+ base_model.model.model.layers.24.self_attn.q_proj.weight torch.Size([4096, 4096]) False
555
+ base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
556
+ base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
557
+ base_model.model.model.layers.24.self_attn.k_proj.weight torch.Size([1024, 4096]) False
558
+ base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
559
+ base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
560
+ base_model.model.model.layers.24.self_attn.v_proj.weight torch.Size([1024, 4096]) False
561
+ base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
562
+ base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
563
+ base_model.model.model.layers.24.self_attn.o_proj.weight torch.Size([4096, 4096]) False
564
+ base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
565
+ base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
566
+ base_model.model.model.layers.24.mlp.gate_proj.weight torch.Size([14336, 4096]) False
567
+ base_model.model.model.layers.24.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
568
+ base_model.model.model.layers.24.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
569
+ base_model.model.model.layers.24.mlp.up_proj.weight torch.Size([14336, 4096]) False
570
+ base_model.model.model.layers.24.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
571
+ base_model.model.model.layers.24.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
572
+ base_model.model.model.layers.24.mlp.down_proj.weight torch.Size([4096, 14336]) False
573
+ base_model.model.model.layers.24.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
574
+ base_model.model.model.layers.24.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
575
+ base_model.model.model.layers.24.input_layernorm.weight torch.Size([4096]) False
576
+ base_model.model.model.layers.24.post_attention_layernorm.weight torch.Size([4096]) False
577
+ base_model.model.model.layers.25.self_attn.q_proj.weight torch.Size([4096, 4096]) False
578
+ base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
579
+ base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
580
+ base_model.model.model.layers.25.self_attn.k_proj.weight torch.Size([1024, 4096]) False
581
+ base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
582
+ base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
583
+ base_model.model.model.layers.25.self_attn.v_proj.weight torch.Size([1024, 4096]) False
584
+ base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
585
+ base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
586
+ base_model.model.model.layers.25.self_attn.o_proj.weight torch.Size([4096, 4096]) False
587
+ base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
588
+ base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
589
+ base_model.model.model.layers.25.mlp.gate_proj.weight torch.Size([14336, 4096]) False
590
+ base_model.model.model.layers.25.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
591
+ base_model.model.model.layers.25.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
592
+ base_model.model.model.layers.25.mlp.up_proj.weight torch.Size([14336, 4096]) False
593
+ base_model.model.model.layers.25.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
594
+ base_model.model.model.layers.25.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
595
+ base_model.model.model.layers.25.mlp.down_proj.weight torch.Size([4096, 14336]) False
596
+ base_model.model.model.layers.25.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
597
+ base_model.model.model.layers.25.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
598
+ base_model.model.model.layers.25.input_layernorm.weight torch.Size([4096]) False
599
+ base_model.model.model.layers.25.post_attention_layernorm.weight torch.Size([4096]) False
600
+ base_model.model.model.layers.26.self_attn.q_proj.weight torch.Size([4096, 4096]) False
601
+ base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
602
+ base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
603
+ base_model.model.model.layers.26.self_attn.k_proj.weight torch.Size([1024, 4096]) False
604
+ base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
605
+ base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
606
+ base_model.model.model.layers.26.self_attn.v_proj.weight torch.Size([1024, 4096]) False
607
+ base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
608
+ base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
609
+ base_model.model.model.layers.26.self_attn.o_proj.weight torch.Size([4096, 4096]) False
610
+ base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
611
+ base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
612
+ base_model.model.model.layers.26.mlp.gate_proj.weight torch.Size([14336, 4096]) False
613
+ base_model.model.model.layers.26.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
614
+ base_model.model.model.layers.26.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
615
+ base_model.model.model.layers.26.mlp.up_proj.weight torch.Size([14336, 4096]) False
616
+ base_model.model.model.layers.26.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
617
+ base_model.model.model.layers.26.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
618
+ base_model.model.model.layers.26.mlp.down_proj.weight torch.Size([4096, 14336]) False
619
+ base_model.model.model.layers.26.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
620
+ base_model.model.model.layers.26.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
621
+ base_model.model.model.layers.26.input_layernorm.weight torch.Size([4096]) False
622
+ base_model.model.model.layers.26.post_attention_layernorm.weight torch.Size([4096]) False
623
+ base_model.model.model.layers.27.self_attn.q_proj.weight torch.Size([4096, 4096]) False
624
+ base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
625
+ base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
626
+ base_model.model.model.layers.27.self_attn.k_proj.weight torch.Size([1024, 4096]) False
627
+ base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
628
+ base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
629
+ base_model.model.model.layers.27.self_attn.v_proj.weight torch.Size([1024, 4096]) False
630
+ base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
631
+ base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
632
+ base_model.model.model.layers.27.self_attn.o_proj.weight torch.Size([4096, 4096]) False
633
+ base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
634
+ base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
635
+ base_model.model.model.layers.27.mlp.gate_proj.weight torch.Size([14336, 4096]) False
636
+ base_model.model.model.layers.27.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
637
+ base_model.model.model.layers.27.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
638
+ base_model.model.model.layers.27.mlp.up_proj.weight torch.Size([14336, 4096]) False
639
+ base_model.model.model.layers.27.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
640
+ base_model.model.model.layers.27.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
641
+ base_model.model.model.layers.27.mlp.down_proj.weight torch.Size([4096, 14336]) False
642
+ base_model.model.model.layers.27.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
643
+ base_model.model.model.layers.27.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
644
+ base_model.model.model.layers.27.input_layernorm.weight torch.Size([4096]) False
645
+ base_model.model.model.layers.27.post_attention_layernorm.weight torch.Size([4096]) False
646
+ base_model.model.model.layers.28.self_attn.q_proj.weight torch.Size([4096, 4096]) False
647
+ base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
648
+ base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
649
+ base_model.model.model.layers.28.self_attn.k_proj.weight torch.Size([1024, 4096]) False
650
+ base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
651
+ base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
652
+ base_model.model.model.layers.28.self_attn.v_proj.weight torch.Size([1024, 4096]) False
653
+ base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
654
+ base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
655
+ base_model.model.model.layers.28.self_attn.o_proj.weight torch.Size([4096, 4096]) False
656
+ base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
657
+ base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
658
+ base_model.model.model.layers.28.mlp.gate_proj.weight torch.Size([14336, 4096]) False
659
+ base_model.model.model.layers.28.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
660
+ base_model.model.model.layers.28.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
661
+ base_model.model.model.layers.28.mlp.up_proj.weight torch.Size([14336, 4096]) False
662
+ base_model.model.model.layers.28.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
663
+ base_model.model.model.layers.28.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
664
+ base_model.model.model.layers.28.mlp.down_proj.weight torch.Size([4096, 14336]) False
665
+ base_model.model.model.layers.28.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
666
+ base_model.model.model.layers.28.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
667
+ base_model.model.model.layers.28.input_layernorm.weight torch.Size([4096]) False
668
+ base_model.model.model.layers.28.post_attention_layernorm.weight torch.Size([4096]) False
669
+ base_model.model.model.layers.29.self_attn.q_proj.weight torch.Size([4096, 4096]) False
670
+ base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
671
+ base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
672
+ base_model.model.model.layers.29.self_attn.k_proj.weight torch.Size([1024, 4096]) False
673
+ base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
674
+ base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
675
+ base_model.model.model.layers.29.self_attn.v_proj.weight torch.Size([1024, 4096]) False
676
+ base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
677
+ base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
678
+ base_model.model.model.layers.29.self_attn.o_proj.weight torch.Size([4096, 4096]) False
679
+ base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
680
+ base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
681
+ base_model.model.model.layers.29.mlp.gate_proj.weight torch.Size([14336, 4096]) False
682
+ base_model.model.model.layers.29.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
683
+ base_model.model.model.layers.29.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
684
+ base_model.model.model.layers.29.mlp.up_proj.weight torch.Size([14336, 4096]) False
685
+ base_model.model.model.layers.29.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
686
+ base_model.model.model.layers.29.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
687
+ base_model.model.model.layers.29.mlp.down_proj.weight torch.Size([4096, 14336]) False
688
+ base_model.model.model.layers.29.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
689
+ base_model.model.model.layers.29.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
690
+ base_model.model.model.layers.29.input_layernorm.weight torch.Size([4096]) False
691
+ base_model.model.model.layers.29.post_attention_layernorm.weight torch.Size([4096]) False
692
+ base_model.model.model.layers.30.self_attn.q_proj.weight torch.Size([4096, 4096]) False
693
+ base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
694
+ base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
695
+ base_model.model.model.layers.30.self_attn.k_proj.weight torch.Size([1024, 4096]) False
696
+ base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
697
+ base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
698
+ base_model.model.model.layers.30.self_attn.v_proj.weight torch.Size([1024, 4096]) False
699
+ base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
700
+ base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
701
+ base_model.model.model.layers.30.self_attn.o_proj.weight torch.Size([4096, 4096]) False
702
+ base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
703
+ base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
704
+ base_model.model.model.layers.30.mlp.gate_proj.weight torch.Size([14336, 4096]) False
705
+ base_model.model.model.layers.30.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
706
+ base_model.model.model.layers.30.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
707
+ base_model.model.model.layers.30.mlp.up_proj.weight torch.Size([14336, 4096]) False
708
+ base_model.model.model.layers.30.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
709
+ base_model.model.model.layers.30.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
710
+ base_model.model.model.layers.30.mlp.down_proj.weight torch.Size([4096, 14336]) False
711
+ base_model.model.model.layers.30.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
712
+ base_model.model.model.layers.30.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
713
+ base_model.model.model.layers.30.input_layernorm.weight torch.Size([4096]) False
714
+ base_model.model.model.layers.30.post_attention_layernorm.weight torch.Size([4096]) False
715
+ base_model.model.model.layers.31.self_attn.q_proj.weight torch.Size([4096, 4096]) False
716
+ base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight torch.Size([64, 4096]) True
717
+ base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight torch.Size([4096, 64]) True
718
+ base_model.model.model.layers.31.self_attn.k_proj.weight torch.Size([1024, 4096]) False
719
+ base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight torch.Size([64, 4096]) True
720
+ base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight torch.Size([1024, 64]) True
721
+ base_model.model.model.layers.31.self_attn.v_proj.weight torch.Size([1024, 4096]) False
722
+ base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight torch.Size([64, 4096]) True
723
+ base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight torch.Size([1024, 64]) True
724
+ base_model.model.model.layers.31.self_attn.o_proj.weight torch.Size([4096, 4096]) False
725
+ base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight torch.Size([64, 4096]) True
726
+ base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight torch.Size([4096, 64]) True
727
+ base_model.model.model.layers.31.mlp.gate_proj.weight torch.Size([14336, 4096]) False
728
+ base_model.model.model.layers.31.mlp.gate_proj.lora_A.default.weight torch.Size([64, 4096]) True
729
+ base_model.model.model.layers.31.mlp.gate_proj.lora_B.default.weight torch.Size([14336, 64]) True
730
+ base_model.model.model.layers.31.mlp.up_proj.weight torch.Size([14336, 4096]) False
731
+ base_model.model.model.layers.31.mlp.up_proj.lora_A.default.weight torch.Size([64, 4096]) True
732
+ base_model.model.model.layers.31.mlp.up_proj.lora_B.default.weight torch.Size([14336, 64]) True
733
+ base_model.model.model.layers.31.mlp.down_proj.weight torch.Size([4096, 14336]) False
734
+ base_model.model.model.layers.31.mlp.down_proj.lora_A.default.weight torch.Size([64, 14336]) True
735
+ base_model.model.model.layers.31.mlp.down_proj.lora_B.default.weight torch.Size([4096, 64]) True
736
+ base_model.model.model.layers.31.input_layernorm.weight torch.Size([4096]) False
737
+ base_model.model.model.layers.31.post_attention_layernorm.weight torch.Size([4096]) False
738
+ base_model.model.model.norm.weight torch.Size([4096]) False
739
+ base_model.model.model.imagebind_lmm_projector.mlps.0.0.weight torch.Size([4096, 1024]) True
740
+ base_model.model.model.imagebind_lmm_projector.mlps.0.0.bias torch.Size([4096]) True
741
+ base_model.model.model.imagebind_lmm_projector.mlps.0.2.weight torch.Size([4096, 4096]) True
742
+ base_model.model.model.imagebind_lmm_projector.mlps.0.2.bias torch.Size([4096]) True
743
+ base_model.model.model.imagebind_lmm_projector.mlps.1.0.weight torch.Size([4096, 1024]) True
744
+ base_model.model.model.imagebind_lmm_projector.mlps.1.0.bias torch.Size([4096]) True
745
+ base_model.model.model.imagebind_lmm_projector.mlps.1.2.weight torch.Size([4096, 4096]) True
746
+ base_model.model.model.imagebind_lmm_projector.mlps.1.2.bias torch.Size([4096]) True
747
+ base_model.model.model.imagebind_lmm_projector.mlps.2.0.weight torch.Size([4096, 1024]) True
748
+ base_model.model.model.imagebind_lmm_projector.mlps.2.0.bias torch.Size([4096]) True
749
+ base_model.model.model.imagebind_lmm_projector.mlps.2.2.weight torch.Size([4096, 4096]) True
750
+ base_model.model.model.imagebind_lmm_projector.mlps.2.2.bias torch.Size([4096]) True
751
+ base_model.model.model.imagebind_lmm_projector.mlps.3.0.weight torch.Size([4096, 1024]) True
752
+ base_model.model.model.imagebind_lmm_projector.mlps.3.0.bias torch.Size([4096]) True
753
+ base_model.model.model.imagebind_lmm_projector.mlps.3.2.weight torch.Size([4096, 4096]) True
754
+ base_model.model.model.imagebind_lmm_projector.mlps.3.2.bias torch.Size([4096]) True
755
+ base_model.model.lm_head.weight torch.Size([32000, 4096]) False
non_lora_trainables.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c1b5617c8d6fbe52b461133e3de7c3786d7cc90ee54f600ab304bc5a7531c53
3
+ size 167843405
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff