Eculid commited on
Commit
d7277e7
·
verified ·
1 Parent(s): d02f369

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,65 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LongVA
2
+ <p align="center">
3
+ <img src="https://i.postimg.cc/4xFmj8wd/v-niah.png" width="800">
4
+ </p>
5
+
6
+ <p align="center">
7
+ 🌐 <a href="https://lmms-lab.github.io/posts/longva/" target="_blank">Blog</a> | 📃 <a href="https://arxiv.org/abs/2406.16852" target="_blank">Paper</a> | 🤗 <a href="https://huggingface.co/collections/lmms-lab/longva-667538e09329dbc7ea498057" target="_blank">Hugging Face</a> | 🎥 <a href="https://longva-demo.lmms-lab.com/" target="_blank">Demo</a>
8
+ </p>
9
+
10
+ Long context capability can **zero-shot transfer** from language to vision.
11
+
12
+ LongVA can process **2000** frames or over **200K** visual tokens. It achieves **state-of-the-art** performance on Video-MME among 7B models.
13
+
14
+ # Usage
15
+
16
+ First follow the instructions in [our repo](https://github.com/EvolvingLMMs-Lab/LongVA) to install relevant packages.
17
+
18
+ ```python
19
+ from longva.model.builder import load_pretrained_model
20
+ from longva.mm_utils import tokenizer_image_token, process_images
21
+ from longva.constants import IMAGE_TOKEN_INDEX
22
+ from PIL import Image
23
+ from decord import VideoReader, cpu
24
+ import torch
25
+ import numpy as np
26
+ # fix seed
27
+ torch.manual_seed(0)
28
+
29
+ model_path = "lmms-lab/LongVA-7B-DPO"
30
+ image_path = "local_demo/assets/lmms-eval.png"
31
+ video_path = "local_demo/assets/dc_demo.mp4"
32
+ max_frames_num = 16 # you can change this to several thousands so long you GPU memory can handle it :)
33
+ gen_kwargs = {"do_sample": True, "temperature": 0.5, "top_p": None, "num_beams": 1, "use_cache": True, "max_new_tokens": 1024}
34
+ tokenizer, model, image_processor, _ = load_pretrained_model(model_path, None, "llava_qwen", device_map="cuda:0")
35
+
36
+ #image input
37
+ prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image>\nDescribe the image in details.<|im_end|>\n<|im_start|>assistant\n"
38
+ input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).to(model.device)
39
+ image = Image.open(image_path).convert("RGB")
40
+ images_tensor = process_images([image], image_processor, model.config).to(model.device, dtype=torch.float16)
41
+ with torch.inference_mode():
42
+ output_ids = model.generate(input_ids, images=images_tensor, image_sizes=[image.size], modalities=["image"], **gen_kwargs)
43
+ outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0].strip()
44
+ print(outputs)
45
+ print("-"*50)
46
+
47
+ #video input
48
+ prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image>\nGive a detailed caption of the video as if I am blind.<|im_end|>\n<|im_start|>assistant\n"
49
+ input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).to(model.device)
50
+ vr = VideoReader(video_path, ctx=cpu(0))
51
+ total_frame_num = len(vr)
52
+ uniform_sampled_frames = np.linspace(0, total_frame_num - 1, max_frames_num, dtype=int)
53
+ frame_idx = uniform_sampled_frames.tolist()
54
+ frames = vr.get_batch(frame_idx).asnumpy()
55
+ video_tensor = image_processor.preprocess(frames, return_tensors="pt")["pixel_values"].to(model.device, dtype=torch.float16)
56
+ with torch.inference_mode():
57
+ output_ids = model.generate(input_ids, images=[video_tensor], modalities=["video"], **gen_kwargs)
58
+ outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0].strip()
59
+ print(outputs)
60
+ ```
61
+
62
+ ## License
63
+
64
+ This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses, including but not limited to the OpenAI Terms of Use for the dataset and the specific licenses for base language models (Qwen2 license). This project does not impose any additional constraints beyond those stipulated in the original licenses. Furthermore, users are reminded to ensure that their use of the dataset and checkpoints is in compliance with all applicable laws and regulations.
65
+
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "<|endoftext|>": 151643,
3
+ "<|im_end|>": 151645,
4
+ "<|im_start|>": 151644
5
+ }
config.json ADDED
@@ -0,0 +1,854 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "LongVa/Qwen2-7B-Instruct-extend-step_1000",
3
+ "architectures": [
4
+ "LlavaQwenForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "eos_token_id": 151645,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 3584,
11
+ "image_aspect_ratio": "anyres",
12
+ "image_crop_resolution": null,
13
+ "image_grid_pinpoints": [
14
+ [
15
+ 336,
16
+ 672
17
+ ],
18
+ [
19
+ 336,
20
+ 1008
21
+ ],
22
+ [
23
+ 336,
24
+ 1344
25
+ ],
26
+ [
27
+ 336,
28
+ 1680
29
+ ],
30
+ [
31
+ 336,
32
+ 2016
33
+ ],
34
+ [
35
+ 336,
36
+ 2352
37
+ ],
38
+ [
39
+ 336,
40
+ 2688
41
+ ],
42
+ [
43
+ 336,
44
+ 3024
45
+ ],
46
+ [
47
+ 336,
48
+ 3360
49
+ ],
50
+ [
51
+ 336,
52
+ 3696
53
+ ],
54
+ [
55
+ 336,
56
+ 4032
57
+ ],
58
+ [
59
+ 336,
60
+ 4368
61
+ ],
62
+ [
63
+ 336,
64
+ 4704
65
+ ],
66
+ [
67
+ 336,
68
+ 5040
69
+ ],
70
+ [
71
+ 336,
72
+ 5376
73
+ ],
74
+ [
75
+ 336,
76
+ 5712
77
+ ],
78
+ [
79
+ 336,
80
+ 6048
81
+ ],
82
+ [
83
+ 336,
84
+ 6384
85
+ ],
86
+ [
87
+ 336,
88
+ 6720
89
+ ],
90
+ [
91
+ 336,
92
+ 7056
93
+ ],
94
+ [
95
+ 336,
96
+ 7392
97
+ ],
98
+ [
99
+ 336,
100
+ 7728
101
+ ],
102
+ [
103
+ 336,
104
+ 8064
105
+ ],
106
+ [
107
+ 336,
108
+ 8400
109
+ ],
110
+ [
111
+ 336,
112
+ 8736
113
+ ],
114
+ [
115
+ 336,
116
+ 9072
117
+ ],
118
+ [
119
+ 336,
120
+ 9408
121
+ ],
122
+ [
123
+ 336,
124
+ 9744
125
+ ],
126
+ [
127
+ 336,
128
+ 10080
129
+ ],
130
+ [
131
+ 336,
132
+ 10416
133
+ ],
134
+ [
135
+ 336,
136
+ 10752
137
+ ],
138
+ [
139
+ 336,
140
+ 11088
141
+ ],
142
+ [
143
+ 336,
144
+ 11424
145
+ ],
146
+ [
147
+ 336,
148
+ 11760
149
+ ],
150
+ [
151
+ 336,
152
+ 12096
153
+ ],
154
+ [
155
+ 336,
156
+ 12432
157
+ ],
158
+ [
159
+ 336,
160
+ 12768
161
+ ],
162
+ [
163
+ 336,
164
+ 13104
165
+ ],
166
+ [
167
+ 336,
168
+ 13440
169
+ ],
170
+ [
171
+ 336,
172
+ 13776
173
+ ],
174
+ [
175
+ 336,
176
+ 14112
177
+ ],
178
+ [
179
+ 336,
180
+ 14448
181
+ ],
182
+ [
183
+ 336,
184
+ 14784
185
+ ],
186
+ [
187
+ 336,
188
+ 15120
189
+ ],
190
+ [
191
+ 336,
192
+ 15456
193
+ ],
194
+ [
195
+ 336,
196
+ 15792
197
+ ],
198
+ [
199
+ 336,
200
+ 16128
201
+ ],
202
+ [
203
+ 336,
204
+ 16464
205
+ ],
206
+ [
207
+ 672,
208
+ 336
209
+ ],
210
+ [
211
+ 672,
212
+ 672
213
+ ],
214
+ [
215
+ 672,
216
+ 1008
217
+ ],
218
+ [
219
+ 672,
220
+ 1344
221
+ ],
222
+ [
223
+ 672,
224
+ 1680
225
+ ],
226
+ [
227
+ 672,
228
+ 2016
229
+ ],
230
+ [
231
+ 672,
232
+ 2352
233
+ ],
234
+ [
235
+ 672,
236
+ 2688
237
+ ],
238
+ [
239
+ 672,
240
+ 3024
241
+ ],
242
+ [
243
+ 672,
244
+ 3360
245
+ ],
246
+ [
247
+ 672,
248
+ 3696
249
+ ],
250
+ [
251
+ 672,
252
+ 4032
253
+ ],
254
+ [
255
+ 672,
256
+ 4368
257
+ ],
258
+ [
259
+ 672,
260
+ 4704
261
+ ],
262
+ [
263
+ 672,
264
+ 5040
265
+ ],
266
+ [
267
+ 672,
268
+ 5376
269
+ ],
270
+ [
271
+ 672,
272
+ 5712
273
+ ],
274
+ [
275
+ 672,
276
+ 6048
277
+ ],
278
+ [
279
+ 672,
280
+ 6384
281
+ ],
282
+ [
283
+ 672,
284
+ 6720
285
+ ],
286
+ [
287
+ 672,
288
+ 7056
289
+ ],
290
+ [
291
+ 672,
292
+ 7392
293
+ ],
294
+ [
295
+ 672,
296
+ 7728
297
+ ],
298
+ [
299
+ 672,
300
+ 8064
301
+ ],
302
+ [
303
+ 1008,
304
+ 336
305
+ ],
306
+ [
307
+ 1008,
308
+ 672
309
+ ],
310
+ [
311
+ 1008,
312
+ 1008
313
+ ],
314
+ [
315
+ 1008,
316
+ 1344
317
+ ],
318
+ [
319
+ 1008,
320
+ 1680
321
+ ],
322
+ [
323
+ 1008,
324
+ 2016
325
+ ],
326
+ [
327
+ 1008,
328
+ 2352
329
+ ],
330
+ [
331
+ 1008,
332
+ 2688
333
+ ],
334
+ [
335
+ 1008,
336
+ 3024
337
+ ],
338
+ [
339
+ 1008,
340
+ 3360
341
+ ],
342
+ [
343
+ 1008,
344
+ 3696
345
+ ],
346
+ [
347
+ 1008,
348
+ 4032
349
+ ],
350
+ [
351
+ 1008,
352
+ 4368
353
+ ],
354
+ [
355
+ 1008,
356
+ 4704
357
+ ],
358
+ [
359
+ 1008,
360
+ 5040
361
+ ],
362
+ [
363
+ 1008,
364
+ 5376
365
+ ],
366
+ [
367
+ 1344,
368
+ 336
369
+ ],
370
+ [
371
+ 1344,
372
+ 672
373
+ ],
374
+ [
375
+ 1344,
376
+ 1008
377
+ ],
378
+ [
379
+ 1344,
380
+ 1344
381
+ ],
382
+ [
383
+ 1344,
384
+ 1680
385
+ ],
386
+ [
387
+ 1344,
388
+ 2016
389
+ ],
390
+ [
391
+ 1344,
392
+ 2352
393
+ ],
394
+ [
395
+ 1344,
396
+ 2688
397
+ ],
398
+ [
399
+ 1344,
400
+ 3024
401
+ ],
402
+ [
403
+ 1344,
404
+ 3360
405
+ ],
406
+ [
407
+ 1344,
408
+ 3696
409
+ ],
410
+ [
411
+ 1344,
412
+ 4032
413
+ ],
414
+ [
415
+ 1680,
416
+ 336
417
+ ],
418
+ [
419
+ 1680,
420
+ 672
421
+ ],
422
+ [
423
+ 1680,
424
+ 1008
425
+ ],
426
+ [
427
+ 1680,
428
+ 1344
429
+ ],
430
+ [
431
+ 1680,
432
+ 1680
433
+ ],
434
+ [
435
+ 1680,
436
+ 2016
437
+ ],
438
+ [
439
+ 1680,
440
+ 2352
441
+ ],
442
+ [
443
+ 1680,
444
+ 2688
445
+ ],
446
+ [
447
+ 1680,
448
+ 3024
449
+ ],
450
+ [
451
+ 2016,
452
+ 336
453
+ ],
454
+ [
455
+ 2016,
456
+ 672
457
+ ],
458
+ [
459
+ 2016,
460
+ 1008
461
+ ],
462
+ [
463
+ 2016,
464
+ 1344
465
+ ],
466
+ [
467
+ 2016,
468
+ 1680
469
+ ],
470
+ [
471
+ 2016,
472
+ 2016
473
+ ],
474
+ [
475
+ 2016,
476
+ 2352
477
+ ],
478
+ [
479
+ 2016,
480
+ 2688
481
+ ],
482
+ [
483
+ 2352,
484
+ 336
485
+ ],
486
+ [
487
+ 2352,
488
+ 672
489
+ ],
490
+ [
491
+ 2352,
492
+ 1008
493
+ ],
494
+ [
495
+ 2352,
496
+ 1344
497
+ ],
498
+ [
499
+ 2352,
500
+ 1680
501
+ ],
502
+ [
503
+ 2352,
504
+ 2016
505
+ ],
506
+ [
507
+ 2352,
508
+ 2352
509
+ ],
510
+ [
511
+ 2688,
512
+ 336
513
+ ],
514
+ [
515
+ 2688,
516
+ 672
517
+ ],
518
+ [
519
+ 2688,
520
+ 1008
521
+ ],
522
+ [
523
+ 2688,
524
+ 1344
525
+ ],
526
+ [
527
+ 2688,
528
+ 1680
529
+ ],
530
+ [
531
+ 2688,
532
+ 2016
533
+ ],
534
+ [
535
+ 3024,
536
+ 336
537
+ ],
538
+ [
539
+ 3024,
540
+ 672
541
+ ],
542
+ [
543
+ 3024,
544
+ 1008
545
+ ],
546
+ [
547
+ 3024,
548
+ 1344
549
+ ],
550
+ [
551
+ 3024,
552
+ 1680
553
+ ],
554
+ [
555
+ 3360,
556
+ 336
557
+ ],
558
+ [
559
+ 3360,
560
+ 672
561
+ ],
562
+ [
563
+ 3360,
564
+ 1008
565
+ ],
566
+ [
567
+ 3360,
568
+ 1344
569
+ ],
570
+ [
571
+ 3696,
572
+ 336
573
+ ],
574
+ [
575
+ 3696,
576
+ 672
577
+ ],
578
+ [
579
+ 3696,
580
+ 1008
581
+ ],
582
+ [
583
+ 3696,
584
+ 1344
585
+ ],
586
+ [
587
+ 4032,
588
+ 336
589
+ ],
590
+ [
591
+ 4032,
592
+ 672
593
+ ],
594
+ [
595
+ 4032,
596
+ 1008
597
+ ],
598
+ [
599
+ 4032,
600
+ 1344
601
+ ],
602
+ [
603
+ 4368,
604
+ 336
605
+ ],
606
+ [
607
+ 4368,
608
+ 672
609
+ ],
610
+ [
611
+ 4368,
612
+ 1008
613
+ ],
614
+ [
615
+ 4704,
616
+ 336
617
+ ],
618
+ [
619
+ 4704,
620
+ 672
621
+ ],
622
+ [
623
+ 4704,
624
+ 1008
625
+ ],
626
+ [
627
+ 5040,
628
+ 336
629
+ ],
630
+ [
631
+ 5040,
632
+ 672
633
+ ],
634
+ [
635
+ 5040,
636
+ 1008
637
+ ],
638
+ [
639
+ 5376,
640
+ 336
641
+ ],
642
+ [
643
+ 5376,
644
+ 672
645
+ ],
646
+ [
647
+ 5376,
648
+ 1008
649
+ ],
650
+ [
651
+ 5712,
652
+ 336
653
+ ],
654
+ [
655
+ 5712,
656
+ 672
657
+ ],
658
+ [
659
+ 6048,
660
+ 336
661
+ ],
662
+ [
663
+ 6048,
664
+ 672
665
+ ],
666
+ [
667
+ 6384,
668
+ 336
669
+ ],
670
+ [
671
+ 6384,
672
+ 672
673
+ ],
674
+ [
675
+ 6720,
676
+ 336
677
+ ],
678
+ [
679
+ 6720,
680
+ 672
681
+ ],
682
+ [
683
+ 7056,
684
+ 336
685
+ ],
686
+ [
687
+ 7056,
688
+ 672
689
+ ],
690
+ [
691
+ 7392,
692
+ 336
693
+ ],
694
+ [
695
+ 7392,
696
+ 672
697
+ ],
698
+ [
699
+ 7728,
700
+ 336
701
+ ],
702
+ [
703
+ 7728,
704
+ 672
705
+ ],
706
+ [
707
+ 8064,
708
+ 336
709
+ ],
710
+ [
711
+ 8064,
712
+ 672
713
+ ],
714
+ [
715
+ 8400,
716
+ 336
717
+ ],
718
+ [
719
+ 8736,
720
+ 336
721
+ ],
722
+ [
723
+ 9072,
724
+ 336
725
+ ],
726
+ [
727
+ 9408,
728
+ 336
729
+ ],
730
+ [
731
+ 9744,
732
+ 336
733
+ ],
734
+ [
735
+ 10080,
736
+ 336
737
+ ],
738
+ [
739
+ 10416,
740
+ 336
741
+ ],
742
+ [
743
+ 10752,
744
+ 336
745
+ ],
746
+ [
747
+ 11088,
748
+ 336
749
+ ],
750
+ [
751
+ 11424,
752
+ 336
753
+ ],
754
+ [
755
+ 11760,
756
+ 336
757
+ ],
758
+ [
759
+ 12096,
760
+ 336
761
+ ],
762
+ [
763
+ 12432,
764
+ 336
765
+ ],
766
+ [
767
+ 12768,
768
+ 336
769
+ ],
770
+ [
771
+ 13104,
772
+ 336
773
+ ],
774
+ [
775
+ 13440,
776
+ 336
777
+ ],
778
+ [
779
+ 13776,
780
+ 336
781
+ ],
782
+ [
783
+ 14112,
784
+ 336
785
+ ],
786
+ [
787
+ 14448,
788
+ 336
789
+ ],
790
+ [
791
+ 14784,
792
+ 336
793
+ ],
794
+ [
795
+ 15120,
796
+ 336
797
+ ],
798
+ [
799
+ 15456,
800
+ 336
801
+ ],
802
+ [
803
+ 15792,
804
+ 336
805
+ ],
806
+ [
807
+ 16128,
808
+ 336
809
+ ],
810
+ [
811
+ 16464,
812
+ 336
813
+ ]
814
+ ],
815
+ "image_split_resolution": null,
816
+ "initializer_range": 0.02,
817
+ "intermediate_size": 18944,
818
+ "max_position_embeddings": 224000,
819
+ "max_window_layers": 28,
820
+ "mm_hidden_size": 1024,
821
+ "mm_patch_merge_type": "unires",
822
+ "mm_spatial_pool_mode": "average",
823
+ "mm_spatial_pool_stride": 2,
824
+ "mm_projector_lr": null,
825
+ "mm_projector_type": "mlp2x_gelu",
826
+ "mm_resampler_type": null,
827
+ "mm_tunable_parts": "mm_vision_tower,mm_mlp_adapter,mm_language_model",
828
+ "mm_use_im_patch_token": false,
829
+ "mm_use_im_start_end": false,
830
+ "mm_vision_select_feature": "patch",
831
+ "mm_vision_select_layer": -2,
832
+ "mm_vision_tower": "openai/clip-vit-large-patch14-336",
833
+ "mm_vision_tower_lr": 2e-06,
834
+ "model_type": "qwen2",
835
+ "num_attention_heads": 28,
836
+ "num_hidden_layers": 28,
837
+ "num_key_value_heads": 4,
838
+ "pos_skipping_range": 4096,
839
+ "rms_norm_eps": 1e-06,
840
+ "rope_scaling": null,
841
+ "rope_theta": 1000000000.0,
842
+ "sliding_window": 131072,
843
+ "tie_word_embeddings": false,
844
+ "tokenizer_model_max_length": 224000,
845
+ "tokenizer_padding_side": "right",
846
+ "torch_dtype": "bfloat16",
847
+ "transformers_version": "4.40.0.dev0",
848
+ "use_cache": true,
849
+ "use_mm_proj": true,
850
+ "use_pos_skipping": false,
851
+ "use_sliding_window": false,
852
+ "vision_tower_pretrained": null,
853
+ "vocab_size": 152064
854
+ }
generation_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation": "flash_attention_2",
3
+ "bos_token_id": 151643,
4
+ "do_sample": true,
5
+ "eos_token_id": [
6
+ 151645,
7
+ 151643
8
+ ],
9
+ "pad_token_id": 151643,
10
+ "repetition_penalty": 1.05,
11
+ "rope_theta": 1000000000.0,
12
+ "temperature": 0.7,
13
+ "top_k": 20,
14
+ "top_p": 0.8,
15
+ "transformers_version": "4.40.0.dev0"
16
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30d6a4a41c42a7473aa95a533b2b25c07dc84a7b2c30db117252ccbee2774560
3
+ size 5248280192
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bee317f6776b19aa136b74c9dac54b6e7ad993eb1aa81b1ac6a8312b2d4a9e84
3
+ size 5321822128
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2e9d5ab081a95b1535c8468e892122e10cf3244cc986e4c74eb83cf774203f7
3
+ size 5301290704
model.safetensors.index.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"metadata": {"mergekit_version": "0.1.0"}, "weight_map": {"lm_head.weight": "model-00001-of-00003.safetensors", "model.embed_tokens.weight": "model-00001-of-00003.safetensors", "model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors", "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors", "model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors", "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00003.safetensors", "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors", "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors", "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00003.safetensors", "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors", "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00003.safetensors", "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors", "model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors", "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors", "model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors", "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00003.safetensors", "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors", "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors", "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00003.safetensors", "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors", "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00003.safetensors", "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors", "model.layers.10.input_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.10.mlp.down_proj.weight": "model-00001-of-00003.safetensors", "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors", "model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors", "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.10.self_attn.k_proj.bias": "model-00001-of-00003.safetensors", "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors", "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors", "model.layers.10.self_attn.q_proj.bias": "model-00001-of-00003.safetensors", "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors", "model.layers.10.self_attn.v_proj.bias": "model-00001-of-00003.safetensors", "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors", "model.layers.11.input_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.11.mlp.down_proj.weight": "model-00001-of-00003.safetensors", "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00003.safetensors", "model.layers.11.mlp.up_proj.weight": "model-00001-of-00003.safetensors", "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.11.self_attn.k_proj.bias": "model-00001-of-00003.safetensors", "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00003.safetensors", "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00003.safetensors", "model.layers.11.self_attn.q_proj.bias": "model-00001-of-00003.safetensors", "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00003.safetensors", "model.layers.11.self_attn.v_proj.bias": "model-00001-of-00003.safetensors", "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00003.safetensors", "model.layers.12.input_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.12.mlp.down_proj.weight": "model-00001-of-00003.safetensors", "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00003.safetensors", "model.layers.12.mlp.up_proj.weight": "model-00001-of-00003.safetensors", "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.12.self_attn.k_proj.bias": "model-00001-of-00003.safetensors", "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00003.safetensors", "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00003.safetensors", "model.layers.12.self_attn.q_proj.bias": "model-00001-of-00003.safetensors", "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00003.safetensors", "model.layers.12.self_attn.v_proj.bias": "model-00001-of-00003.safetensors", "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00003.safetensors", "model.layers.13.input_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.13.mlp.down_proj.weight": "model-00001-of-00003.safetensors", "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00003.safetensors", "model.layers.13.mlp.up_proj.weight": "model-00001-of-00003.safetensors", "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.13.self_attn.k_proj.bias": "model-00001-of-00003.safetensors", "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00003.safetensors", "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00003.safetensors", "model.layers.13.self_attn.q_proj.bias": "model-00001-of-00003.safetensors", "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00003.safetensors", "model.layers.13.self_attn.v_proj.bias": "model-00001-of-00003.safetensors", "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00003.safetensors", "model.layers.14.input_layernorm.weight": "model-00001-of-00003.safetensors", "model.layers.14.mlp.down_proj.weight": "model-00001-of-00003.safetensors", "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00003.safetensors", "model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.14.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.14.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.14.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.15.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.15.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.15.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.16.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.16.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.16.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.17.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.17.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.17.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.18.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.18.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.18.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.19.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.19.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.19.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.2.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.2.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.2.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.2.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.2.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.2.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.2.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.2.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.2.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.2.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.2.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.2.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.20.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.20.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.20.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.21.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.21.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.21.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.22.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.22.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.22.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.22.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.23.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.23.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.23.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.23.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.23.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.23.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.24.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.24.mlp.down_proj.weight": "model-00002-of-00003.safetensors", "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00003.safetensors", "model.layers.24.mlp.up_proj.weight": "model-00002-of-00003.safetensors", "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.24.self_attn.k_proj.bias": "model-00002-of-00003.safetensors", "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00003.safetensors", "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00003.safetensors", "model.layers.24.self_attn.q_proj.bias": "model-00002-of-00003.safetensors", "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00003.safetensors", "model.layers.24.self_attn.v_proj.bias": "model-00002-of-00003.safetensors", "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00003.safetensors", "model.layers.25.input_layernorm.weight": "model-00002-of-00003.safetensors", "model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors", "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors", "model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors", "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.25.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors", "model.layers.25.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.layers.25.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors", "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors", "model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors", "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.26.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors", "model.layers.26.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.layers.26.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors", "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors", "model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors", "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.27.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors", "model.layers.27.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.layers.27.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.layers.3.input_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.3.mlp.down_proj.weight": "model-00003-of-00003.safetensors", "model.layers.3.mlp.gate_proj.weight": "model-00003-of-00003.safetensors", "model.layers.3.mlp.up_proj.weight": "model-00003-of-00003.safetensors", "model.layers.3.post_attention_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.3.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.layers.3.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.layers.3.self_attn.o_proj.weight": "model-00003-of-00003.safetensors", "model.layers.3.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.layers.3.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.layers.3.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.layers.3.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.layers.4.input_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.4.mlp.down_proj.weight": "model-00003-of-00003.safetensors", "model.layers.4.mlp.gate_proj.weight": "model-00003-of-00003.safetensors", "model.layers.4.mlp.up_proj.weight": "model-00003-of-00003.safetensors", "model.layers.4.post_attention_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.4.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.layers.4.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.layers.4.self_attn.o_proj.weight": "model-00003-of-00003.safetensors", "model.layers.4.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.layers.4.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.layers.4.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.layers.4.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.layers.5.input_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.5.mlp.down_proj.weight": "model-00003-of-00003.safetensors", "model.layers.5.mlp.gate_proj.weight": "model-00003-of-00003.safetensors", "model.layers.5.mlp.up_proj.weight": "model-00003-of-00003.safetensors", "model.layers.5.post_attention_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.5.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.layers.5.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.layers.5.self_attn.o_proj.weight": "model-00003-of-00003.safetensors", "model.layers.5.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.layers.5.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.layers.5.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.layers.5.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.layers.6.input_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.6.mlp.down_proj.weight": "model-00003-of-00003.safetensors", "model.layers.6.mlp.gate_proj.weight": "model-00003-of-00003.safetensors", "model.layers.6.mlp.up_proj.weight": "model-00003-of-00003.safetensors", "model.layers.6.post_attention_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.6.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.layers.6.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.layers.6.self_attn.o_proj.weight": "model-00003-of-00003.safetensors", "model.layers.6.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.layers.6.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.layers.6.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.layers.6.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.layers.7.input_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.7.mlp.down_proj.weight": "model-00003-of-00003.safetensors", "model.layers.7.mlp.gate_proj.weight": "model-00003-of-00003.safetensors", "model.layers.7.mlp.up_proj.weight": "model-00003-of-00003.safetensors", "model.layers.7.post_attention_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.7.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.layers.7.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.layers.7.self_attn.o_proj.weight": "model-00003-of-00003.safetensors", "model.layers.7.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.layers.7.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.layers.7.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.layers.7.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.layers.8.input_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.8.mlp.down_proj.weight": "model-00003-of-00003.safetensors", "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00003.safetensors", "model.layers.8.mlp.up_proj.weight": "model-00003-of-00003.safetensors", "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.8.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.layers.8.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.layers.8.self_attn.o_proj.weight": "model-00003-of-00003.safetensors", "model.layers.8.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.layers.8.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.layers.8.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.layers.8.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.layers.9.input_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.9.mlp.down_proj.weight": "model-00003-of-00003.safetensors", "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00003.safetensors", "model.layers.9.mlp.up_proj.weight": "model-00003-of-00003.safetensors", "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00003.safetensors", "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00003.safetensors", "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.mm_projector.0.bias": "model-00003-of-00003.safetensors", "model.mm_projector.0.weight": "model-00003-of-00003.safetensors", "model.mm_projector.2.bias": "model-00003-of-00003.safetensors", "model.mm_projector.2.weight": "model-00003-of-00003.safetensors", "model.norm.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.embeddings.class_embedding": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.embeddings.patch_embedding.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.embeddings.position_embedding.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.post_layernorm.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.post_layernorm.weight": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.pre_layrnorm.bias": "model-00003-of-00003.safetensors", "model.vision_tower.vision_tower.vision_model.pre_layrnorm.weight": "model-00003-of-00003.safetensors"}}
special_tokens_map.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>"
5
+ ],
6
+ "eos_token": {
7
+ "content": "<|im_end|>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "pad_token": {
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ }
20
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "151643": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "151644": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "151645": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ }
28
+ },
29
+ "additional_special_tokens": [
30
+ "<|im_start|>",
31
+ "<|im_end|>"
32
+ ],
33
+ "bos_token": null,
34
+ "chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
35
+ "clean_up_tokenization_spaces": false,
36
+ "eos_token": "<|im_end|>",
37
+ "errors": "replace",
38
+ "model_max_length": 32768,
39
+ "pad_token": "<|endoftext|>",
40
+ "padding_side": "right",
41
+ "split_special_tokens": false,
42
+ "tokenizer_class": "Qwen2Tokenizer",
43
+ "unk_token": null
44
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
vocab.json ADDED
The diff for this file is too large to render. See raw diff