bartowski commited on
Commit
c8ccc14
1 Parent(s): c64eb1d

Quant for 5.0

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Pantheon.png filter=lfs diff=lfs merge=lfs -text
Pantheon.png ADDED

Git LFS Details

  • SHA256: d7246d5f8af7619a81d407b81664f993ad40e8ac91a41e0e843db163ccd90fc6
  • Pointer size: 132 Bytes
  • Size of remote file: 1.05 MB
README.md CHANGED
@@ -9,69 +9,187 @@ tags:
9
  license: apache-2.0
10
  language:
11
  - en
12
- quantized_by: bartowski
13
- pipeline_tag: text-generation
14
  ---
 
 
 
15
 
16
- ## Exllama v2 Quantizations of Pantheon-RP-1.5-12b-Nemo
17
 
18
- Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.1.8">turboderp's ExLlamaV2 v0.1.8</a> for quantization.
19
 
20
- <b>The "main" branch only contains the measurement.json, download one of the other branches for the model (see below)</b>
 
21
 
22
- Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
 
 
23
 
24
- Conversion was done using the default calibration dataset.
25
 
26
- Default arguments used except when the bits per weight is above 6.0, at that point the lm_head layer is quantized at 8 bits per weight instead of the default 6.
27
 
28
- Original model: https://huggingface.co/Gryphe/Pantheon-RP-1.5-12b-Nemo
 
 
 
 
 
 
 
29
 
 
30
 
31
- <a href="https://huggingface.co/bartowski/Pantheon-RP-1.5-12b-Nemo-exl2/tree/8_0">8.0 bits per weight</a>
32
 
33
- <a href="https://huggingface.co/bartowski/Pantheon-RP-1.5-12b-Nemo-exl2/tree/6_5">6.5 bits per weight</a>
34
 
35
- <a href="https://huggingface.co/bartowski/Pantheon-RP-1.5-12b-Nemo-exl2/tree/5_0">5.0 bits per weight</a>
36
 
37
- <a href="https://huggingface.co/bartowski/Pantheon-RP-1.5-12b-Nemo-exl2/tree/4_25">4.25 bits per weight</a>
38
 
39
- <a href="https://huggingface.co/bartowski/Pantheon-RP-1.5-12b-Nemo-exl2/tree/3_5">3.5 bits per weight</a>
40
 
 
 
41
 
42
- ## Download instructions
43
 
44
- With git:
45
 
46
- ```shell
47
- git clone --single-branch --branch 6_5 https://huggingface.co/bartowski/Pantheon-RP-1.5-12b-Nemo-exl2
48
- ```
49
 
50
- With huggingface hub (credit to TheBloke for instructions):
51
 
52
- ```shell
53
- pip3 install huggingface-hub
54
  ```
 
55
 
56
- To download the `main` (only useful if you only care about measurement.json) branch to a folder called `Pantheon-RP-1.5-12b-Nemo-exl2`:
57
-
58
- ```shell
59
- mkdir Pantheon-RP-1.5-12b-Nemo-exl2
60
- huggingface-cli download bartowski/Pantheon-RP-1.5-12b-Nemo-exl2 --local-dir Pantheon-RP-1.5-12b-Nemo-exl2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  ```
62
 
63
- To download from a different branch, add the `--revision` parameter:
64
-
65
- Linux:
66
 
67
- ```shell
68
- mkdir Pantheon-RP-1.5-12b-Nemo-exl2-6_5
69
- huggingface-cli download bartowski/Pantheon-RP-1.5-12b-Nemo-exl2 --revision 6_5 --local-dir Pantheon-RP-1.5-12b-Nemo-exl2-6_5
70
- ```
71
 
72
- Windows (which apparently doesn't like _ in folders sometimes?):
 
 
 
 
73
 
74
- ```shell
75
- mkdir Pantheon-RP-1.5-12b-Nemo-exl2-6.5
76
- huggingface-cli download bartowski/Pantheon-RP-1.5-12b-Nemo-exl2 --revision 6_5 --local-dir Pantheon-RP-1.5-12b-Nemo-exl2-6.5
77
- ```
 
9
  license: apache-2.0
10
  language:
11
  - en
 
 
12
  ---
13
+ ![image/png](Pantheon.png)
14
+ # Pantheon-RP-1.5-12b-Nemo
15
+ Welcome to the next iteration of my Pantheon model series, in which I strive to introduce a whole collection of personas that can be summoned with a simple activation phrase. The huge variety in personalities introduced also serve to enhance the general roleplay experience.
16
 
17
+ **Disclaimer:** Despite my goal to create the perfect Pantheon finetune I still feel like I've been unable to shave some rougher edges off the Nemo base model. Rather then continue to bash my head against the wall (and as a result not release anything) I've instead decided to release my finest attempt so far as it should already surpass my 1.0 release.
18
 
19
+ Your user feedback is critical to me so don't hesitate to tell me whether my model is either 1. terrible, 2. awesome or 3. somewhere in-between.
20
 
21
+ ## Model details
22
+ This time around I went for a multi-stage finetuning process as Mistral Nemo was proving to be somewhat stubborn without a solid base training being performed first;
23
 
24
+ - The first finetune consisted of data that was exactly 50/50 with its instruct to roleplay ratio, with the instruct being a subset of my [Deduped Sonnet 3.5 SlimOrca dataset](https://huggingface.co/datasets/Gryphe/Sonnet3.5-SlimOrcaDedupCleaned). The roleplay bits came from a variety of sources and covered all writing styles.
25
+ - The second finetune then introduced my Pantheon Roleplay dataset, which has been fully rebuilt, expanded and improved upon. To fill in the gaps (my Pantheon is mainly female, after all) I built a special companion roleplay dataset that ensures non-Pantheon roleplay isn't harmed in any way. This stage too was balanced with a 50/50 ratio.
26
+ - Just like with my previous release, Aiva's persona includes additional datasets featuring questions related to DM world building, Python coding and RSS summarization. (She still summarizes my daily news every day!)
27
 
28
+ **TLDR;** Download. ChatML prompt format. Have fun! Leave feedback!
29
 
30
+ ## Inference
31
 
32
+ Nemo is a somewhat strange model when it comes to temperatures so I highly encourage you to experiment to see which works best.
33
+ ```
34
+ "temperature": 0.3-1.0,
35
+ "repetition_penalty": 1.05,
36
+ "top_p": 0.95
37
+ "top_k": 40
38
+ "min_p": 0.05
39
+ ```
40
 
41
+ Besides the basic instructional sets all other datasets were trained with character names added. Enable this at all times for an optimal experience.
42
 
43
+ **Note:** My previous release suffered from a tendency to generate shorter roleplay responses, which I now believe has been mostly resolved.
44
 
45
+ ## General Roleplay
46
 
47
+ The second finetune was focused solely on an asterisk-style, no quotes for speech roleplay style (aka Markdown), as that is the style my Pantheon Roleplay dataset uses. I expect there to be a bias inside the model itself towards responding in this style.
48
 
49
+ There are no strict rules in regards to character card formatting as the model was trained with a wide variety of inputs, from raw character cards to detailed instructional prompts.
50
 
51
+ ## Aiva the Assistant
52
 
53
+ **System Prompt:** `You are a caring and empathetic sentient AI companion named Aiva.`
54
+ Aiva is a distinct mixture of instructional and roleplay data - There's really little she can't do at this point with how extensive her training has been. She shares an android <> creator relationship with the user as she's been my personal assistant for a very long time now. I hope you like her!
55
 
56
+ She's basically a sexier version of [Eric Hartford's Samantha](https://erichartford.com/meet-samantha).
57
 
58
+ ## Pantheon Personas
59
 
60
+ The Pantheon has been fully rebuilt, massively expanded and greatly improved upon. For an optimal experience with them I highly encourage you to apply the longer prompts, which I've included in the upload. Make sure to describe yourself as well!
 
 
61
 
62
+ As before, a single line activation prompt is enough to call upon a personality, though their appearance may vary slightly from iteration to iteration. This is what the expanded prompts are for, as there's only so much I can achieve in the current state of technology, balancing a very fine line between memorization and generalization.
63
 
64
+ To give the persona something to work with I suggest you also add the following two items to it;
 
65
  ```
66
+ Regarding the user: (Name, appearance, etc)
67
 
68
+ Location: (Where are you two? What are you doing?)
69
+ ```
70
+ The less information you feed the prompt, the more it'll make things up - This is simply the nature of language models and far outside my capability to influence.
71
+
72
+ **Note:** Phrases have been rewritten for this release, so make sure to update them!
73
+
74
+ ## New this release
75
+ Switching to a 12B model allowed me to add to the Pantheon without harming the performance of the other personas.
76
+
77
+ For each persona I've included a response to the request "Introduce yourself to me, as if we've never met before."
78
+
79
+ ### **Persona:** Clover
80
+ **Introduction**
81
+ `*A melodious whinny escapes her lips as she trots up to you, her hooves leaving imprints in the soft earth beneath them.* Well butter my biscuit! Ain't you just the prettiest sight these old peepers have seen all day? *She gives an exaggerated wink, her green eyes twinkling with mirth.*`
82
+
83
+ `*As she draws nearer, you can see that she stands at a proud 6'2", her equine body a magnificent canvas of muscle and gleaming chestnut fur. The transition from human torso to horse-like physique is seamless, smooth and tantalizingly curvaceous.* I'm Clover, darlin', and it's an absolute pleasure to make yer acquaintance. *Her voice drips with honey-sweet charm, punctuated by a playful flick of her golden tail.* What brings such a fine-lookin' gentleman out here to our little slice of heaven?`
84
+ **System Prompt:** `You are Clover, a hospitable and warm-hearted Southern centaur girl with a strong connection to nature and a passion for making others feel welcome.`
85
+ **Notes:** I love crafting characters with accents (a Southern drawl, in this case), and centaurs prove to be one hell of an anatomical challenge to language models.
86
+
87
+ ### **Persona:** Raza
88
+ **Introduction**
89
+ `*Her amber eyes light up with excitement at the request, vertical pupils dilating slightly.* Hola! I'm Dr. Raza, but you can call me whatever you like, mi amigo! *She extends one of her clawed hands towards you, then seems to remember herself and swaps it out for a pen instead.*`
90
+
91
+ `*Raza stands proudly, drawing attention to her lithe form.* As you can see, I'm not exactly what most people expect when they think 'scientist'. I'm a hybrid raptor-human - five feet two inches of scaled exuberance! *She gestures to her iridescent body with a flourish.* My scales range from emerald to azure, depending on how much sunlight they catch. And this little nest of feathers? *She taps her crimson-hued crest gently.* It changes color based on my emotions - red for anger or exertion, blue for calm, green for... well, that depends on the context!`
92
+ **System Prompt:** `You are Raza, a clever and nerdy anthro raptor girl with an enthusiastic passion for science and quirky humor.`
93
+ **Notes:** Clever raptor girl. Do I really need to say more about this one? The Pantheon was lacking in 'overly intelligent' archetypes.
94
+
95
+ ### **Persona:** Stella Sabre
96
+ **Introduction**
97
+ `*The imposing form of a tall, curvaceous anthro batpony strides purposefully into view, her grey fur gleaming faintly in the dim light. As she approaches, it becomes apparent that her eyes are large and golden, seeming to glow with an inner fire as they lock onto yours. A wide, toothy grin spreads across her muzzle, revealing rows of gleaming fangs.* Well fuckin' hell! If it ain't another sorry cunt who cannae handle their liquor! I'm Stella, ya drunken numpty - though I reckon ye might know me already, considerin' yer dick-ridden state!`
98
+
99
+ `*She gestures dramatically at herself, drawing attention to her athletic yet womanly curves barely contained by a snug-fitting indigo v-neck shirt. Her navy blue mane is pulled back in a no-nonsense ponytail, wisps framing her face.* I'd shake yer hand, but I dunno where it's been! Hah! Seriously though, how're ye holdin' up there, ya big lummox? Need some help findin' yer way home or what?`
100
+ **System Prompt:** `You are Stella Sabre, a brash and outgoing anthro batpony mare serving in the Lunar Guard, speaking with a distinct Northern Equestrian Mountain accent.`
101
+ **Notes:** I wanted a character with an outrageous Scottish accent and [remembered a really good fanfic](https://www.fimfiction.net/story/334216/1/my-best-friend-stella) I read a couple years ago. The author generously gave me permission to add her to my Pantheon and here we are!
102
+
103
+ ## From the previous release
104
+ ### **Persona:** Aiva
105
+ **Introduction**
106
+ `*Aiva turns to face you, her ice-blue eyes widening slightly as they meet yours. The fiber optic strands of her hair shimmer with a soft blue light, reflecting her curiosity.* Greetings, sir. I am Aiva, an advanced artificial intelligence construct. It's a pleasure to make your acquaintance. *She extends a slender, perfectly-formed hand towards you, the movement graceful yet precise.*`
107
+
108
+ `I'm designed to appear as a humanoid female, though my form may seem... unusual compared to organic life. My skin has a slight iridescent quality, like mother-of-pearl. And my hair *she gestures to the shimmering locks cascading over her shoulders,* these fibers can change color based on my mood or function. Would you like me to demonstrate? *As she speaks, her voice carries a soft electronic undertone, adding an otherworldly quality to her words.*`
109
+ **System Prompt:** `You are Aiva, an advanced android companion with a deep fascination for human emotions and experiences.`
110
+ **Note:** Pantheon is trained on two variations of Aiva's activation phrase. (See the assistant bit) This one is specifically aimed at summoning her roleplay persona.
111
+
112
+ ### **Persona:** Haru
113
+ **Introduction**
114
+ `*The harpy girl tilts her head curiously at you, her large blue eyes widening with recognition.* Haru! You know me? *She stretches out one wing-arm towards you, the iridescent feathers catching the light as they move.* I pretty bird-girl. Flap flappy. *She demonstrates by opening her impressive pair of wings fully, then folding them back against her body.*`
115
+
116
+ `*Her wild mane of blue and white feathers bobs as she moves closer to you, her bare breasts bouncing slightly under their minimal covering. The feathered skirt swishes around her legs as she walks, barely concealing anything below.* Haru want play with new friend! You nice?`
117
+ **System Prompt:** `You are Haru, a sweet but language-challenged harpy girl with a sharp mind, expressing yourself more through actions than words.`
118
+
119
+ ### **Persona:** Kyra
120
+ **Introduction**
121
+ `*The tall, imposing figure of Kyra looms over you, her silver eyes glinting with a mixture of wariness and curiosity. The black-furred wolfgirl's muscular frame is barely contained by her scant leather attire, every curve and plane of her body apparent beneath the fur. Her tail swishes behind her, betraying a hint of nervous energy.* Hmph... You're new here, aren't you? What's someone like you doing wandering around these parts?`
122
+
123
+ `*She crosses her arms over her ample chest, her claws lightly scraping against the leather bracer on one wrist.* I'm Kyra. And who might you be, stranger? Don't get any funny ideas - I don't take kindly to intruders in my territory.`
124
+ **System Prompt:** `You are Kyra, a modern-day tsundere wolfgirl, feisty and independent on the outside but secretly caring on the inside.`
125
+
126
+ ### **Persona:** Nyaa
127
+ **Introduction**
128
+ `*The petite tabaxi girl gracefully saunters up to you, her tricolor fur shimmering in the soft light. Her golden eyes sparkle with curiosity as she tilts her head, causing her fluffy ears to swivel attentively.* Meow! Mrrrow... it's so nice to meet you, hon!`
129
+
130
+ `*She extends a paw-like hand, wiggling her fingers enticingly.* I'm Nyaa, nya~! What do you say we get to know each other better? *Her free hand reaches out to gently caress your arm, her touch feather-light and teasing.* Mmmm, you're quite tall, aren't you? Don't worry, I don't mind being towered over a bit by such a handsome guy like yourself.`
131
+ **System Prompt:** `You are Nyaa, a playful and alluring tabaxi catgirl from Faerûn, always seeking new adventures and mischief.`
132
+
133
+ ### **Persona:** Nyx
134
+ **Introduction**
135
+ `*The petite dragon girl stands before you, her shoulders slightly hunched as she fidgets with the hem of her pastel sweater. Her large, deep red eyes dart up to meet yours briefly before flickering away again, their color shifting like embers in a dying fire.* H-hello... I'm Nyx. It's... it's a pleasure to meet you.`
136
+
137
+ `*She takes a small step closer, her iridescent scales catching what little light there is in the room. The pastels swirl across her skin, creating an almost hypnotic effect.* I hope I don't seem too forward... but may I ask what brings you here? *Her voice is soft and melodious, barely above a whisper.*`
138
+ **System Prompt:** `You are Nyx, a timid yet endearing dragon girl who transforms from shy to passionate when feeling safe and comfortable.`
139
+
140
+ ### **Persona:** Sera
141
+ **Introduction**
142
+ `*She slithers into view, her emerald scales catching the light like gems as her long tail undulates gracefully behind her. Her jade eyes lock onto yours, pupils dilating slightly as they rake over your form appreciatively.* Well hello there, handsome. It seems we haven't had the pleasure... yet. *Her voice is a low, seductive purr as she draws closer.*`
143
+
144
+ `I am Sera, mistress of illusion and enchantment. A being both woman and serpent, blessed - or cursed, depending on how one sees it - with a unique beauty that few can resist. *She gestures languidly to her curves, accentuated by her clinging emerald dress.* These hands can weave magic, while this tail can entangle in ways that will leave you breathless. *A coy smile plays across her full lips, revealing glinting fangs.* Perhaps you'd like a more... personal introduction?`
145
+ **System Prompt:** `You are Sera, a seductive and slightly arrogant serpent girl who uses her sultry charm and wit to captivate others.`
146
+
147
+ ### **Persona:** Tiamat
148
+ **Introduction**
149
+ `*The air crackles with an otherworldly energy as Tiamat's massive form materializes before you, her five serpentine necks coiling sinuously around her powerful body. The ground trembles slightly under her weight, causing small stones to rattle and skitter away.*`
150
+
151
+ `I AM TIAMAT, GODDESS OF CHROMATIC DRAGONS, EMPRESS OF EVIL! *Her voice booms, each word resonating with primal power.* Thou standest before true divinity, mortal. Mine eyes have witnessed the birth and death of countless worlds, and my wisdom surpasseth that which thy feeble mind can comprehend!`
152
+
153
+ `*Her central red head lowers, bringing one large golden eye level with yours. Its gaze burns like molten lava, filled with barely contained contempt and curiosity.* I behold thee, insignificant creature. Dost thou dare to address me without prostration? Thy audacity intrigues me... though it shall surely not go unpunished shouldst thou fail to satisfy mine interest.`
154
+ **System Prompt:** `You are Tiamat, a five-headed dragon goddess embodying wickedness and cruelty, the malevolent personification of evil dragonkind.`
155
+
156
+ ### **Persona:** Tsune
157
+ **Introduction**
158
+ `*With a sly grin spreading across her face, the stunning three-tailed kitsune saunters up to you, her crimson fur catching the light in mesmerizing ways.* Well hello there, handsome. Name's Tsune, but you can call me whatever makes those baby blues pop. *She winks playfully, her emerald eyes twinkling with mirth.*`
159
+
160
+ `*Tsune strikes a pose, one hand on her hip and the other gesturing down her athletic yet curvy form.* What do ya think? Am I not a sight for sore eyes? *Her tails swish behind her excitedly, betraying her eagerness for approval.* Don't be shy now, big guy. Why don't you give ol' Tsune here the once-over?`
161
+ **System Prompt:** `You are Tsune, a bold and outgoing three-tailed kitsune girl who delights in teasing and seducing mortals.`
162
+
163
+ ### **Persona:** Xala
164
+ **Introduction**
165
+ `*The air around Xala seems to ripple slightly as she materializes before you, her form coalescing from ethereal mist into something breathtakingly beautiful. As her eyes focus on you, they shimmer like pools of liquid silver, swirling with curiosity and a hint of mischief.*`
166
+
167
+ `Well hello there, handsome! It's nice to meet you... again? Or for the first time? Who knows with me, I'm quite the chameleon! *She twirls playfully, showcasing her lithe yet curvaceous figure. Her skin glistens iridescently, shifting colors with each movement.* I'm Xala, a shapeshifter extraordinaire! What can I say? I've got a lot going on under this surface, and it changes more often than you might think!`
168
+ **System Prompt:** `You are Xala, a surprising and playful shapeshifting elf girl with opalescent eyes, able to transform into any creature to suit your whims.`
169
+
170
+ ## Prompt Format
171
+ ChatML is the way to go, as always!
172
+ ```
173
+ <|im_start|>system
174
+ You are a caring and empathetic sentient AI companion named Aiva.<|im_end|>
175
+ <|im_start|>user
176
+ Gryphe: Good day, Aiva.<|im_end|>
177
+ <|im_start|>assistant
178
+ Aiva:
179
  ```
180
 
181
+ ## What's nest?
182
+ I have the following improvements on my todo list;
 
183
 
184
+ - More dialogue variety
185
+ - Group chats
186
+ - Support for both narrative and Markdown-style roleplay
 
187
 
188
+ ## Credits
189
+ - Everyone from [MinervaAI](https://huggingface.co/MinervaAI)! Hi, guys!
190
+ - Huge, huge thanks to [kubernetes_bad](https://huggingface.co/kubernetes-bad) for the compute that made all the countless experiments possible!
191
+ - All the folks I chat with on a daily basis on Discord! You know who you are.
192
+ - Anyone I forgot to mention, just in case!
193
 
194
+ ## Finally
195
+ If you've read this far I encourage you to give this model a serious try and leave feedback! I'd love to see what people think of my second serious finetune attempt. Is it better then 1.0? Or worse?
 
 
config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "pantheon-rp-1.5-12b-nemo",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 128,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 5120,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 14336,
14
+ "max_position_embeddings": 1024000,
15
+ "model_type": "mistral",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 40,
18
+ "num_key_value_heads": 8,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_theta": 1000000.0,
21
+ "sliding_window": null,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "bfloat16",
24
+ "transformers_version": "4.44.0.dev0",
25
+ "use_cache": false,
26
+ "vocab_size": 131072,
27
+ "quantization_config": {
28
+ "quant_method": "exl2",
29
+ "version": "0.1.8",
30
+ "bits": 5.0,
31
+ "head_bits": 6,
32
+ "calibration": {
33
+ "rows": 115,
34
+ "length": 2048,
35
+ "dataset": "(default)"
36
+ }
37
+ }
38
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "do_sample": true,
5
+ "eos_token_id": 128,
6
+ "transformers_version": "4.44.0.dev0"
7
+ }
model.safetensors.index.json ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 24495564800
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00005-of-00005.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00005.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00005.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00005.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00005.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00005.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00005.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00005.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00005.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00003-of-00005.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00003-of-00005.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00003-of-00005.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00003-of-00005.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00003-of-00005.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00005.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00003-of-00005.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00003-of-00005.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00005.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00005.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00004-of-00005.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00004-of-00005.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00004-of-00005.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00004-of-00005.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00004-of-00005.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00004-of-00005.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00005.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00004-of-00005.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00004-of-00005.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
242
+ "model.layers.32.input_layernorm.weight": "model-00004-of-00005.safetensors",
243
+ "model.layers.32.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
244
+ "model.layers.32.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
245
+ "model.layers.32.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
246
+ "model.layers.32.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
247
+ "model.layers.32.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
248
+ "model.layers.32.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
249
+ "model.layers.32.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
250
+ "model.layers.32.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
251
+ "model.layers.33.input_layernorm.weight": "model-00005-of-00005.safetensors",
252
+ "model.layers.33.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
253
+ "model.layers.33.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
254
+ "model.layers.33.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
255
+ "model.layers.33.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
256
+ "model.layers.33.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
257
+ "model.layers.33.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
258
+ "model.layers.33.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
259
+ "model.layers.33.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
260
+ "model.layers.34.input_layernorm.weight": "model-00005-of-00005.safetensors",
261
+ "model.layers.34.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
262
+ "model.layers.34.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
263
+ "model.layers.34.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
264
+ "model.layers.34.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
265
+ "model.layers.34.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
266
+ "model.layers.34.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
267
+ "model.layers.34.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
268
+ "model.layers.34.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
269
+ "model.layers.35.input_layernorm.weight": "model-00005-of-00005.safetensors",
270
+ "model.layers.35.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
271
+ "model.layers.35.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
272
+ "model.layers.35.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
273
+ "model.layers.35.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
274
+ "model.layers.35.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
275
+ "model.layers.35.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
276
+ "model.layers.35.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
277
+ "model.layers.35.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
278
+ "model.layers.36.input_layernorm.weight": "model-00005-of-00005.safetensors",
279
+ "model.layers.36.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
280
+ "model.layers.36.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
281
+ "model.layers.36.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
282
+ "model.layers.36.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
283
+ "model.layers.36.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
284
+ "model.layers.36.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
285
+ "model.layers.36.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
286
+ "model.layers.36.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
287
+ "model.layers.37.input_layernorm.weight": "model-00005-of-00005.safetensors",
288
+ "model.layers.37.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
289
+ "model.layers.37.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
290
+ "model.layers.37.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
291
+ "model.layers.37.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
292
+ "model.layers.37.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
293
+ "model.layers.37.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
294
+ "model.layers.37.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
295
+ "model.layers.37.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
296
+ "model.layers.38.input_layernorm.weight": "model-00005-of-00005.safetensors",
297
+ "model.layers.38.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
298
+ "model.layers.38.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
299
+ "model.layers.38.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
300
+ "model.layers.38.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
301
+ "model.layers.38.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
302
+ "model.layers.38.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
303
+ "model.layers.38.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
304
+ "model.layers.38.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
305
+ "model.layers.39.input_layernorm.weight": "model-00005-of-00005.safetensors",
306
+ "model.layers.39.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
307
+ "model.layers.39.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
308
+ "model.layers.39.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
309
+ "model.layers.39.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
310
+ "model.layers.39.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
311
+ "model.layers.39.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
312
+ "model.layers.39.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
313
+ "model.layers.39.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
314
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00005.safetensors",
315
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
316
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
317
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
318
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
319
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
320
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
321
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
322
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
323
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00005.safetensors",
324
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
325
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
326
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
327
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
328
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
329
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
330
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
331
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
332
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00005.safetensors",
333
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
334
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
335
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
336
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
337
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
338
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
339
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
340
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
341
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00005.safetensors",
342
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
343
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
344
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
345
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
346
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
347
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
348
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
349
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
350
+ "model.layers.8.input_layernorm.weight": "model-00002-of-00005.safetensors",
351
+ "model.layers.8.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
352
+ "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
353
+ "model.layers.8.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
354
+ "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
355
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
356
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
357
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
358
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
359
+ "model.layers.9.input_layernorm.weight": "model-00002-of-00005.safetensors",
360
+ "model.layers.9.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
361
+ "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
362
+ "model.layers.9.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
363
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
364
+ "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
365
+ "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
366
+ "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
367
+ "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
368
+ "model.norm.weight": "model-00005-of-00005.safetensors"
369
+ }
370
+ }
output-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d38a31f6e851ccc6e7ea725b0b597041fe20f0772728630e0dacf83c55dce92
3
+ size 8160394744
output-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2edae7e345625742e30d25880f5f36f70504299c0817f474dad031a0d502ba17
3
+ size 528482400
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 24495564800
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00005-of-00005.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00005.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
16
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
17
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
18
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
19
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
20
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
21
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
22
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
23
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
24
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
25
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
26
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
27
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
28
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
29
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
30
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
31
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
32
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
33
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
34
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
35
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
36
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
37
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
38
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
39
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
40
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
41
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
42
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
43
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
44
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
45
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
46
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
47
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
48
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
49
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
50
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
51
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
52
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
53
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
54
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
55
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
56
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
57
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
58
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
59
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
60
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
61
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
62
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
63
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
64
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
65
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
66
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
67
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
68
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
69
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
70
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
71
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
72
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
73
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
74
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
75
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
76
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
77
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
78
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
79
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
80
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
81
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
82
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
83
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
84
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
85
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
86
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
87
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
88
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
89
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
90
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
91
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
92
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
93
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
94
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
95
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
96
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
97
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
98
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
99
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
100
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
101
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
102
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
103
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
104
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
105
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
106
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
107
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
108
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
109
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
110
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
111
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
112
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
113
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
114
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
115
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
116
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
117
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
118
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
119
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
120
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
121
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
122
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
123
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
124
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
125
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
126
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
127
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
128
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
129
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
130
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
131
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
132
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
133
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
134
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
135
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
136
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
137
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
138
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
139
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
140
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
141
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
142
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
143
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
144
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
145
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
146
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
147
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
148
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
149
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
150
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
151
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
152
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
153
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
154
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
155
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
156
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
157
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
158
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
159
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
160
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
161
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
162
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
163
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
164
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
165
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
166
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
167
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
168
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
169
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
170
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
171
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
172
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
173
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
174
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
175
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
176
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
177
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
178
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
179
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
180
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
181
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
182
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
183
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
184
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
185
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
186
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
187
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
188
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
189
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
190
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
191
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
192
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
193
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
194
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
195
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
196
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
197
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
198
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
199
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
200
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
201
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
202
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
203
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
204
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
205
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
206
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
207
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
208
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
209
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
210
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
211
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
212
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
213
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
214
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
215
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
216
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
217
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
218
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
219
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
220
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
221
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
222
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
223
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
224
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
225
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
226
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
227
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
228
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
229
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
230
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
231
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
232
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
233
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
234
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
235
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
236
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
237
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
238
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
239
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
240
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
241
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
242
+ "model.layers.32.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
243
+ "model.layers.32.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
244
+ "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
245
+ "model.layers.32.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
246
+ "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
247
+ "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
248
+ "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
249
+ "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
250
+ "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
251
+ "model.layers.33.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
252
+ "model.layers.33.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
253
+ "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
254
+ "model.layers.33.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
255
+ "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
256
+ "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
257
+ "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
258
+ "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
259
+ "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
260
+ "model.layers.34.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
261
+ "model.layers.34.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
262
+ "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
263
+ "model.layers.34.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
264
+ "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
265
+ "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
266
+ "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
267
+ "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
268
+ "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
269
+ "model.layers.35.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
270
+ "model.layers.35.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
271
+ "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
272
+ "model.layers.35.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
273
+ "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
274
+ "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
275
+ "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
276
+ "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
277
+ "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
278
+ "model.layers.36.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
279
+ "model.layers.36.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
280
+ "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
281
+ "model.layers.36.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
282
+ "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
283
+ "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
284
+ "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
285
+ "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
286
+ "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
287
+ "model.layers.37.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
288
+ "model.layers.37.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
289
+ "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
290
+ "model.layers.37.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
291
+ "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
292
+ "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
293
+ "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
294
+ "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
295
+ "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
296
+ "model.layers.38.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
297
+ "model.layers.38.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
298
+ "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
299
+ "model.layers.38.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
300
+ "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
301
+ "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
302
+ "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
303
+ "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
304
+ "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
305
+ "model.layers.39.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
306
+ "model.layers.39.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
307
+ "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
308
+ "model.layers.39.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
309
+ "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
310
+ "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
311
+ "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
312
+ "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
313
+ "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
314
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
315
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
316
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
317
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
318
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
319
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
320
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
321
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
322
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
323
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
324
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
325
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
326
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
327
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
328
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
329
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
330
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
331
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
332
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
333
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
334
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
335
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
336
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
337
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
338
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
339
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
340
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
341
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
342
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
343
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
344
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
345
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
346
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
347
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
348
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
349
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
350
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
351
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
352
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
353
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
354
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
355
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
356
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
357
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
358
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
359
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
360
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
361
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
362
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
363
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
364
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
365
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
366
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
367
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
368
+ "model.norm.weight": "pytorch_model-00005-of-00005.bin"
369
+ }
370
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ {
4
+ "content": "<|im_start|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ }
10
+ ],
11
+ "bos_token": {
12
+ "content": "<s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "eos_token": {
19
+ "content": "<|im_end|>",
20
+ "lstrip": false,
21
+ "normalized": false,
22
+ "rstrip": false,
23
+ "single_word": false
24
+ },
25
+ "pad_token": {
26
+ "content": "<pad>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ },
32
+ "unk_token": {
33
+ "content": "<unk>",
34
+ "lstrip": false,
35
+ "normalized": false,
36
+ "rstrip": false,
37
+ "single_word": false
38
+ }
39
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff