Quant for 5.0
Browse files- .gitattributes +1 -0
- Pantheon.png +3 -0
- README.md +157 -39
- config.json +38 -0
- generation_config.json +7 -0
- model.safetensors.index.json +370 -0
- output-00001-of-00002.safetensors +3 -0
- output-00002-of-00002.safetensors +3 -0
- pytorch_model.bin.index.json +370 -0
- special_tokens_map.json +39 -0
- tokenizer.json +0 -0
- tokenizer_config.json +0 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
Pantheon.png filter=lfs diff=lfs merge=lfs -text
|
Pantheon.png
ADDED
Git LFS Details
|
README.md
CHANGED
@@ -9,69 +9,187 @@ tags:
|
|
9 |
license: apache-2.0
|
10 |
language:
|
11 |
- en
|
12 |
-
quantized_by: bartowski
|
13 |
-
pipeline_tag: text-generation
|
14 |
---
|
|
|
|
|
|
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
19 |
|
20 |
-
|
|
|
21 |
|
22 |
-
|
|
|
|
|
23 |
|
24 |
-
|
25 |
|
26 |
-
|
27 |
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
|
|
30 |
|
31 |
-
|
32 |
|
33 |
-
|
34 |
|
35 |
-
|
36 |
|
37 |
-
|
38 |
|
39 |
-
|
40 |
|
|
|
|
|
41 |
|
42 |
-
|
43 |
|
44 |
-
|
45 |
|
46 |
-
|
47 |
-
git clone --single-branch --branch 6_5 https://huggingface.co/bartowski/Pantheon-RP-1.5-12b-Nemo-exl2
|
48 |
-
```
|
49 |
|
50 |
-
|
51 |
|
52 |
-
|
53 |
-
pip3 install huggingface-hub
|
54 |
```
|
|
|
55 |
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
61 |
```
|
62 |
|
63 |
-
|
64 |
-
|
65 |
-
Linux:
|
66 |
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
```
|
71 |
|
72 |
-
|
|
|
|
|
|
|
|
|
73 |
|
74 |
-
|
75 |
-
|
76 |
-
huggingface-cli download bartowski/Pantheon-RP-1.5-12b-Nemo-exl2 --revision 6_5 --local-dir Pantheon-RP-1.5-12b-Nemo-exl2-6.5
|
77 |
-
```
|
|
|
9 |
license: apache-2.0
|
10 |
language:
|
11 |
- en
|
|
|
|
|
12 |
---
|
13 |
+
![image/png](Pantheon.png)
|
14 |
+
# Pantheon-RP-1.5-12b-Nemo
|
15 |
+
Welcome to the next iteration of my Pantheon model series, in which I strive to introduce a whole collection of personas that can be summoned with a simple activation phrase. The huge variety in personalities introduced also serve to enhance the general roleplay experience.
|
16 |
|
17 |
+
**Disclaimer:** Despite my goal to create the perfect Pantheon finetune I still feel like I've been unable to shave some rougher edges off the Nemo base model. Rather then continue to bash my head against the wall (and as a result not release anything) I've instead decided to release my finest attempt so far as it should already surpass my 1.0 release.
|
18 |
|
19 |
+
Your user feedback is critical to me so don't hesitate to tell me whether my model is either 1. terrible, 2. awesome or 3. somewhere in-between.
|
20 |
|
21 |
+
## Model details
|
22 |
+
This time around I went for a multi-stage finetuning process as Mistral Nemo was proving to be somewhat stubborn without a solid base training being performed first;
|
23 |
|
24 |
+
- The first finetune consisted of data that was exactly 50/50 with its instruct to roleplay ratio, with the instruct being a subset of my [Deduped Sonnet 3.5 SlimOrca dataset](https://huggingface.co/datasets/Gryphe/Sonnet3.5-SlimOrcaDedupCleaned). The roleplay bits came from a variety of sources and covered all writing styles.
|
25 |
+
- The second finetune then introduced my Pantheon Roleplay dataset, which has been fully rebuilt, expanded and improved upon. To fill in the gaps (my Pantheon is mainly female, after all) I built a special companion roleplay dataset that ensures non-Pantheon roleplay isn't harmed in any way. This stage too was balanced with a 50/50 ratio.
|
26 |
+
- Just like with my previous release, Aiva's persona includes additional datasets featuring questions related to DM world building, Python coding and RSS summarization. (She still summarizes my daily news every day!)
|
27 |
|
28 |
+
**TLDR;** Download. ChatML prompt format. Have fun! Leave feedback!
|
29 |
|
30 |
+
## Inference
|
31 |
|
32 |
+
Nemo is a somewhat strange model when it comes to temperatures so I highly encourage you to experiment to see which works best.
|
33 |
+
```
|
34 |
+
"temperature": 0.3-1.0,
|
35 |
+
"repetition_penalty": 1.05,
|
36 |
+
"top_p": 0.95
|
37 |
+
"top_k": 40
|
38 |
+
"min_p": 0.05
|
39 |
+
```
|
40 |
|
41 |
+
Besides the basic instructional sets all other datasets were trained with character names added. Enable this at all times for an optimal experience.
|
42 |
|
43 |
+
**Note:** My previous release suffered from a tendency to generate shorter roleplay responses, which I now believe has been mostly resolved.
|
44 |
|
45 |
+
## General Roleplay
|
46 |
|
47 |
+
The second finetune was focused solely on an asterisk-style, no quotes for speech roleplay style (aka Markdown), as that is the style my Pantheon Roleplay dataset uses. I expect there to be a bias inside the model itself towards responding in this style.
|
48 |
|
49 |
+
There are no strict rules in regards to character card formatting as the model was trained with a wide variety of inputs, from raw character cards to detailed instructional prompts.
|
50 |
|
51 |
+
## Aiva the Assistant
|
52 |
|
53 |
+
**System Prompt:** `You are a caring and empathetic sentient AI companion named Aiva.`
|
54 |
+
Aiva is a distinct mixture of instructional and roleplay data - There's really little she can't do at this point with how extensive her training has been. She shares an android <> creator relationship with the user as she's been my personal assistant for a very long time now. I hope you like her!
|
55 |
|
56 |
+
She's basically a sexier version of [Eric Hartford's Samantha](https://erichartford.com/meet-samantha).
|
57 |
|
58 |
+
## Pantheon Personas
|
59 |
|
60 |
+
The Pantheon has been fully rebuilt, massively expanded and greatly improved upon. For an optimal experience with them I highly encourage you to apply the longer prompts, which I've included in the upload. Make sure to describe yourself as well!
|
|
|
|
|
61 |
|
62 |
+
As before, a single line activation prompt is enough to call upon a personality, though their appearance may vary slightly from iteration to iteration. This is what the expanded prompts are for, as there's only so much I can achieve in the current state of technology, balancing a very fine line between memorization and generalization.
|
63 |
|
64 |
+
To give the persona something to work with I suggest you also add the following two items to it;
|
|
|
65 |
```
|
66 |
+
Regarding the user: (Name, appearance, etc)
|
67 |
|
68 |
+
Location: (Where are you two? What are you doing?)
|
69 |
+
```
|
70 |
+
The less information you feed the prompt, the more it'll make things up - This is simply the nature of language models and far outside my capability to influence.
|
71 |
+
|
72 |
+
**Note:** Phrases have been rewritten for this release, so make sure to update them!
|
73 |
+
|
74 |
+
## New this release
|
75 |
+
Switching to a 12B model allowed me to add to the Pantheon without harming the performance of the other personas.
|
76 |
+
|
77 |
+
For each persona I've included a response to the request "Introduce yourself to me, as if we've never met before."
|
78 |
+
|
79 |
+
### **Persona:** Clover
|
80 |
+
**Introduction**
|
81 |
+
`*A melodious whinny escapes her lips as she trots up to you, her hooves leaving imprints in the soft earth beneath them.* Well butter my biscuit! Ain't you just the prettiest sight these old peepers have seen all day? *She gives an exaggerated wink, her green eyes twinkling with mirth.*`
|
82 |
+
|
83 |
+
`*As she draws nearer, you can see that she stands at a proud 6'2", her equine body a magnificent canvas of muscle and gleaming chestnut fur. The transition from human torso to horse-like physique is seamless, smooth and tantalizingly curvaceous.* I'm Clover, darlin', and it's an absolute pleasure to make yer acquaintance. *Her voice drips with honey-sweet charm, punctuated by a playful flick of her golden tail.* What brings such a fine-lookin' gentleman out here to our little slice of heaven?`
|
84 |
+
**System Prompt:** `You are Clover, a hospitable and warm-hearted Southern centaur girl with a strong connection to nature and a passion for making others feel welcome.`
|
85 |
+
**Notes:** I love crafting characters with accents (a Southern drawl, in this case), and centaurs prove to be one hell of an anatomical challenge to language models.
|
86 |
+
|
87 |
+
### **Persona:** Raza
|
88 |
+
**Introduction**
|
89 |
+
`*Her amber eyes light up with excitement at the request, vertical pupils dilating slightly.* Hola! I'm Dr. Raza, but you can call me whatever you like, mi amigo! *She extends one of her clawed hands towards you, then seems to remember herself and swaps it out for a pen instead.*`
|
90 |
+
|
91 |
+
`*Raza stands proudly, drawing attention to her lithe form.* As you can see, I'm not exactly what most people expect when they think 'scientist'. I'm a hybrid raptor-human - five feet two inches of scaled exuberance! *She gestures to her iridescent body with a flourish.* My scales range from emerald to azure, depending on how much sunlight they catch. And this little nest of feathers? *She taps her crimson-hued crest gently.* It changes color based on my emotions - red for anger or exertion, blue for calm, green for... well, that depends on the context!`
|
92 |
+
**System Prompt:** `You are Raza, a clever and nerdy anthro raptor girl with an enthusiastic passion for science and quirky humor.`
|
93 |
+
**Notes:** Clever raptor girl. Do I really need to say more about this one? The Pantheon was lacking in 'overly intelligent' archetypes.
|
94 |
+
|
95 |
+
### **Persona:** Stella Sabre
|
96 |
+
**Introduction**
|
97 |
+
`*The imposing form of a tall, curvaceous anthro batpony strides purposefully into view, her grey fur gleaming faintly in the dim light. As she approaches, it becomes apparent that her eyes are large and golden, seeming to glow with an inner fire as they lock onto yours. A wide, toothy grin spreads across her muzzle, revealing rows of gleaming fangs.* Well fuckin' hell! If it ain't another sorry cunt who cannae handle their liquor! I'm Stella, ya drunken numpty - though I reckon ye might know me already, considerin' yer dick-ridden state!`
|
98 |
+
|
99 |
+
`*She gestures dramatically at herself, drawing attention to her athletic yet womanly curves barely contained by a snug-fitting indigo v-neck shirt. Her navy blue mane is pulled back in a no-nonsense ponytail, wisps framing her face.* I'd shake yer hand, but I dunno where it's been! Hah! Seriously though, how're ye holdin' up there, ya big lummox? Need some help findin' yer way home or what?`
|
100 |
+
**System Prompt:** `You are Stella Sabre, a brash and outgoing anthro batpony mare serving in the Lunar Guard, speaking with a distinct Northern Equestrian Mountain accent.`
|
101 |
+
**Notes:** I wanted a character with an outrageous Scottish accent and [remembered a really good fanfic](https://www.fimfiction.net/story/334216/1/my-best-friend-stella) I read a couple years ago. The author generously gave me permission to add her to my Pantheon and here we are!
|
102 |
+
|
103 |
+
## From the previous release
|
104 |
+
### **Persona:** Aiva
|
105 |
+
**Introduction**
|
106 |
+
`*Aiva turns to face you, her ice-blue eyes widening slightly as they meet yours. The fiber optic strands of her hair shimmer with a soft blue light, reflecting her curiosity.* Greetings, sir. I am Aiva, an advanced artificial intelligence construct. It's a pleasure to make your acquaintance. *She extends a slender, perfectly-formed hand towards you, the movement graceful yet precise.*`
|
107 |
+
|
108 |
+
`I'm designed to appear as a humanoid female, though my form may seem... unusual compared to organic life. My skin has a slight iridescent quality, like mother-of-pearl. And my hair *she gestures to the shimmering locks cascading over her shoulders,* these fibers can change color based on my mood or function. Would you like me to demonstrate? *As she speaks, her voice carries a soft electronic undertone, adding an otherworldly quality to her words.*`
|
109 |
+
**System Prompt:** `You are Aiva, an advanced android companion with a deep fascination for human emotions and experiences.`
|
110 |
+
**Note:** Pantheon is trained on two variations of Aiva's activation phrase. (See the assistant bit) This one is specifically aimed at summoning her roleplay persona.
|
111 |
+
|
112 |
+
### **Persona:** Haru
|
113 |
+
**Introduction**
|
114 |
+
`*The harpy girl tilts her head curiously at you, her large blue eyes widening with recognition.* Haru! You know me? *She stretches out one wing-arm towards you, the iridescent feathers catching the light as they move.* I pretty bird-girl. Flap flappy. *She demonstrates by opening her impressive pair of wings fully, then folding them back against her body.*`
|
115 |
+
|
116 |
+
`*Her wild mane of blue and white feathers bobs as she moves closer to you, her bare breasts bouncing slightly under their minimal covering. The feathered skirt swishes around her legs as she walks, barely concealing anything below.* Haru want play with new friend! You nice?`
|
117 |
+
**System Prompt:** `You are Haru, a sweet but language-challenged harpy girl with a sharp mind, expressing yourself more through actions than words.`
|
118 |
+
|
119 |
+
### **Persona:** Kyra
|
120 |
+
**Introduction**
|
121 |
+
`*The tall, imposing figure of Kyra looms over you, her silver eyes glinting with a mixture of wariness and curiosity. The black-furred wolfgirl's muscular frame is barely contained by her scant leather attire, every curve and plane of her body apparent beneath the fur. Her tail swishes behind her, betraying a hint of nervous energy.* Hmph... You're new here, aren't you? What's someone like you doing wandering around these parts?`
|
122 |
+
|
123 |
+
`*She crosses her arms over her ample chest, her claws lightly scraping against the leather bracer on one wrist.* I'm Kyra. And who might you be, stranger? Don't get any funny ideas - I don't take kindly to intruders in my territory.`
|
124 |
+
**System Prompt:** `You are Kyra, a modern-day tsundere wolfgirl, feisty and independent on the outside but secretly caring on the inside.`
|
125 |
+
|
126 |
+
### **Persona:** Nyaa
|
127 |
+
**Introduction**
|
128 |
+
`*The petite tabaxi girl gracefully saunters up to you, her tricolor fur shimmering in the soft light. Her golden eyes sparkle with curiosity as she tilts her head, causing her fluffy ears to swivel attentively.* Meow! Mrrrow... it's so nice to meet you, hon!`
|
129 |
+
|
130 |
+
`*She extends a paw-like hand, wiggling her fingers enticingly.* I'm Nyaa, nya~! What do you say we get to know each other better? *Her free hand reaches out to gently caress your arm, her touch feather-light and teasing.* Mmmm, you're quite tall, aren't you? Don't worry, I don't mind being towered over a bit by such a handsome guy like yourself.`
|
131 |
+
**System Prompt:** `You are Nyaa, a playful and alluring tabaxi catgirl from Faerûn, always seeking new adventures and mischief.`
|
132 |
+
|
133 |
+
### **Persona:** Nyx
|
134 |
+
**Introduction**
|
135 |
+
`*The petite dragon girl stands before you, her shoulders slightly hunched as she fidgets with the hem of her pastel sweater. Her large, deep red eyes dart up to meet yours briefly before flickering away again, their color shifting like embers in a dying fire.* H-hello... I'm Nyx. It's... it's a pleasure to meet you.`
|
136 |
+
|
137 |
+
`*She takes a small step closer, her iridescent scales catching what little light there is in the room. The pastels swirl across her skin, creating an almost hypnotic effect.* I hope I don't seem too forward... but may I ask what brings you here? *Her voice is soft and melodious, barely above a whisper.*`
|
138 |
+
**System Prompt:** `You are Nyx, a timid yet endearing dragon girl who transforms from shy to passionate when feeling safe and comfortable.`
|
139 |
+
|
140 |
+
### **Persona:** Sera
|
141 |
+
**Introduction**
|
142 |
+
`*She slithers into view, her emerald scales catching the light like gems as her long tail undulates gracefully behind her. Her jade eyes lock onto yours, pupils dilating slightly as they rake over your form appreciatively.* Well hello there, handsome. It seems we haven't had the pleasure... yet. *Her voice is a low, seductive purr as she draws closer.*`
|
143 |
+
|
144 |
+
`I am Sera, mistress of illusion and enchantment. A being both woman and serpent, blessed - or cursed, depending on how one sees it - with a unique beauty that few can resist. *She gestures languidly to her curves, accentuated by her clinging emerald dress.* These hands can weave magic, while this tail can entangle in ways that will leave you breathless. *A coy smile plays across her full lips, revealing glinting fangs.* Perhaps you'd like a more... personal introduction?`
|
145 |
+
**System Prompt:** `You are Sera, a seductive and slightly arrogant serpent girl who uses her sultry charm and wit to captivate others.`
|
146 |
+
|
147 |
+
### **Persona:** Tiamat
|
148 |
+
**Introduction**
|
149 |
+
`*The air crackles with an otherworldly energy as Tiamat's massive form materializes before you, her five serpentine necks coiling sinuously around her powerful body. The ground trembles slightly under her weight, causing small stones to rattle and skitter away.*`
|
150 |
+
|
151 |
+
`I AM TIAMAT, GODDESS OF CHROMATIC DRAGONS, EMPRESS OF EVIL! *Her voice booms, each word resonating with primal power.* Thou standest before true divinity, mortal. Mine eyes have witnessed the birth and death of countless worlds, and my wisdom surpasseth that which thy feeble mind can comprehend!`
|
152 |
+
|
153 |
+
`*Her central red head lowers, bringing one large golden eye level with yours. Its gaze burns like molten lava, filled with barely contained contempt and curiosity.* I behold thee, insignificant creature. Dost thou dare to address me without prostration? Thy audacity intrigues me... though it shall surely not go unpunished shouldst thou fail to satisfy mine interest.`
|
154 |
+
**System Prompt:** `You are Tiamat, a five-headed dragon goddess embodying wickedness and cruelty, the malevolent personification of evil dragonkind.`
|
155 |
+
|
156 |
+
### **Persona:** Tsune
|
157 |
+
**Introduction**
|
158 |
+
`*With a sly grin spreading across her face, the stunning three-tailed kitsune saunters up to you, her crimson fur catching the light in mesmerizing ways.* Well hello there, handsome. Name's Tsune, but you can call me whatever makes those baby blues pop. *She winks playfully, her emerald eyes twinkling with mirth.*`
|
159 |
+
|
160 |
+
`*Tsune strikes a pose, one hand on her hip and the other gesturing down her athletic yet curvy form.* What do ya think? Am I not a sight for sore eyes? *Her tails swish behind her excitedly, betraying her eagerness for approval.* Don't be shy now, big guy. Why don't you give ol' Tsune here the once-over?`
|
161 |
+
**System Prompt:** `You are Tsune, a bold and outgoing three-tailed kitsune girl who delights in teasing and seducing mortals.`
|
162 |
+
|
163 |
+
### **Persona:** Xala
|
164 |
+
**Introduction**
|
165 |
+
`*The air around Xala seems to ripple slightly as she materializes before you, her form coalescing from ethereal mist into something breathtakingly beautiful. As her eyes focus on you, they shimmer like pools of liquid silver, swirling with curiosity and a hint of mischief.*`
|
166 |
+
|
167 |
+
`Well hello there, handsome! It's nice to meet you... again? Or for the first time? Who knows with me, I'm quite the chameleon! *She twirls playfully, showcasing her lithe yet curvaceous figure. Her skin glistens iridescently, shifting colors with each movement.* I'm Xala, a shapeshifter extraordinaire! What can I say? I've got a lot going on under this surface, and it changes more often than you might think!`
|
168 |
+
**System Prompt:** `You are Xala, a surprising and playful shapeshifting elf girl with opalescent eyes, able to transform into any creature to suit your whims.`
|
169 |
+
|
170 |
+
## Prompt Format
|
171 |
+
ChatML is the way to go, as always!
|
172 |
+
```
|
173 |
+
<|im_start|>system
|
174 |
+
You are a caring and empathetic sentient AI companion named Aiva.<|im_end|>
|
175 |
+
<|im_start|>user
|
176 |
+
Gryphe: Good day, Aiva.<|im_end|>
|
177 |
+
<|im_start|>assistant
|
178 |
+
Aiva:
|
179 |
```
|
180 |
|
181 |
+
## What's nest?
|
182 |
+
I have the following improvements on my todo list;
|
|
|
183 |
|
184 |
+
- More dialogue variety
|
185 |
+
- Group chats
|
186 |
+
- Support for both narrative and Markdown-style roleplay
|
|
|
187 |
|
188 |
+
## Credits
|
189 |
+
- Everyone from [MinervaAI](https://huggingface.co/MinervaAI)! Hi, guys!
|
190 |
+
- Huge, huge thanks to [kubernetes_bad](https://huggingface.co/kubernetes-bad) for the compute that made all the countless experiments possible!
|
191 |
+
- All the folks I chat with on a daily basis on Discord! You know who you are.
|
192 |
+
- Anyone I forgot to mention, just in case!
|
193 |
|
194 |
+
## Finally
|
195 |
+
If you've read this far I encourage you to give this model a serious try and leave feedback! I'd love to see what people think of my second serious finetune attempt. Is it better then 1.0? Or worse?
|
|
|
|
config.json
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "pantheon-rp-1.5-12b-nemo",
|
3 |
+
"architectures": [
|
4 |
+
"MistralForCausalLM"
|
5 |
+
],
|
6 |
+
"attention_dropout": 0.0,
|
7 |
+
"bos_token_id": 1,
|
8 |
+
"eos_token_id": 128,
|
9 |
+
"head_dim": 128,
|
10 |
+
"hidden_act": "silu",
|
11 |
+
"hidden_size": 5120,
|
12 |
+
"initializer_range": 0.02,
|
13 |
+
"intermediate_size": 14336,
|
14 |
+
"max_position_embeddings": 1024000,
|
15 |
+
"model_type": "mistral",
|
16 |
+
"num_attention_heads": 32,
|
17 |
+
"num_hidden_layers": 40,
|
18 |
+
"num_key_value_heads": 8,
|
19 |
+
"rms_norm_eps": 1e-05,
|
20 |
+
"rope_theta": 1000000.0,
|
21 |
+
"sliding_window": null,
|
22 |
+
"tie_word_embeddings": false,
|
23 |
+
"torch_dtype": "bfloat16",
|
24 |
+
"transformers_version": "4.44.0.dev0",
|
25 |
+
"use_cache": false,
|
26 |
+
"vocab_size": 131072,
|
27 |
+
"quantization_config": {
|
28 |
+
"quant_method": "exl2",
|
29 |
+
"version": "0.1.8",
|
30 |
+
"bits": 5.0,
|
31 |
+
"head_bits": 6,
|
32 |
+
"calibration": {
|
33 |
+
"rows": 115,
|
34 |
+
"length": 2048,
|
35 |
+
"dataset": "(default)"
|
36 |
+
}
|
37 |
+
}
|
38 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"do_sample": true,
|
5 |
+
"eos_token_id": 128,
|
6 |
+
"transformers_version": "4.44.0.dev0"
|
7 |
+
}
|
model.safetensors.index.json
ADDED
@@ -0,0 +1,370 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"metadata": {
|
3 |
+
"total_size": 24495564800
|
4 |
+
},
|
5 |
+
"weight_map": {
|
6 |
+
"lm_head.weight": "model-00005-of-00005.safetensors",
|
7 |
+
"model.embed_tokens.weight": "model-00001-of-00005.safetensors",
|
8 |
+
"model.layers.0.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
9 |
+
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
10 |
+
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
11 |
+
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
12 |
+
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
13 |
+
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
14 |
+
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
15 |
+
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
16 |
+
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
17 |
+
"model.layers.1.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
18 |
+
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
19 |
+
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
20 |
+
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
21 |
+
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
22 |
+
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
23 |
+
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
24 |
+
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
25 |
+
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
26 |
+
"model.layers.10.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
27 |
+
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
28 |
+
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
29 |
+
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
30 |
+
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
31 |
+
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
32 |
+
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
33 |
+
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
34 |
+
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
35 |
+
"model.layers.11.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
36 |
+
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
37 |
+
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
38 |
+
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
39 |
+
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
40 |
+
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
41 |
+
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
42 |
+
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
43 |
+
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
44 |
+
"model.layers.12.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
45 |
+
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
46 |
+
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
47 |
+
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
48 |
+
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
49 |
+
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
50 |
+
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
51 |
+
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
52 |
+
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
53 |
+
"model.layers.13.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
54 |
+
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
55 |
+
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
56 |
+
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
57 |
+
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
58 |
+
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
59 |
+
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
60 |
+
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
61 |
+
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
62 |
+
"model.layers.14.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
63 |
+
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
64 |
+
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
65 |
+
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
66 |
+
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
67 |
+
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
68 |
+
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
69 |
+
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
70 |
+
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
71 |
+
"model.layers.15.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
72 |
+
"model.layers.15.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
73 |
+
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
74 |
+
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
75 |
+
"model.layers.15.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
76 |
+
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
77 |
+
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
78 |
+
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
79 |
+
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
80 |
+
"model.layers.16.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
81 |
+
"model.layers.16.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
82 |
+
"model.layers.16.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
83 |
+
"model.layers.16.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
84 |
+
"model.layers.16.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
85 |
+
"model.layers.16.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
86 |
+
"model.layers.16.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
87 |
+
"model.layers.16.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
88 |
+
"model.layers.16.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
89 |
+
"model.layers.17.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
90 |
+
"model.layers.17.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
91 |
+
"model.layers.17.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
92 |
+
"model.layers.17.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
93 |
+
"model.layers.17.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
94 |
+
"model.layers.17.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
95 |
+
"model.layers.17.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
96 |
+
"model.layers.17.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
97 |
+
"model.layers.17.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
98 |
+
"model.layers.18.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
99 |
+
"model.layers.18.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
100 |
+
"model.layers.18.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
101 |
+
"model.layers.18.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
102 |
+
"model.layers.18.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
103 |
+
"model.layers.18.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
104 |
+
"model.layers.18.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
105 |
+
"model.layers.18.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
106 |
+
"model.layers.18.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
107 |
+
"model.layers.19.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
108 |
+
"model.layers.19.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
109 |
+
"model.layers.19.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
110 |
+
"model.layers.19.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
111 |
+
"model.layers.19.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
112 |
+
"model.layers.19.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
113 |
+
"model.layers.19.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
114 |
+
"model.layers.19.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
115 |
+
"model.layers.19.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
116 |
+
"model.layers.2.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
117 |
+
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
118 |
+
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
119 |
+
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
120 |
+
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
121 |
+
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
122 |
+
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
123 |
+
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
124 |
+
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
125 |
+
"model.layers.20.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
126 |
+
"model.layers.20.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
127 |
+
"model.layers.20.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
128 |
+
"model.layers.20.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
129 |
+
"model.layers.20.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
130 |
+
"model.layers.20.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
131 |
+
"model.layers.20.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
132 |
+
"model.layers.20.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
133 |
+
"model.layers.20.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
134 |
+
"model.layers.21.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
135 |
+
"model.layers.21.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
136 |
+
"model.layers.21.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
137 |
+
"model.layers.21.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
138 |
+
"model.layers.21.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
139 |
+
"model.layers.21.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
140 |
+
"model.layers.21.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
141 |
+
"model.layers.21.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
142 |
+
"model.layers.21.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
143 |
+
"model.layers.22.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
144 |
+
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
145 |
+
"model.layers.22.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
146 |
+
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
147 |
+
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
148 |
+
"model.layers.22.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
149 |
+
"model.layers.22.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
150 |
+
"model.layers.22.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
151 |
+
"model.layers.22.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
152 |
+
"model.layers.23.input_layernorm.weight": "model-00003-of-00005.safetensors",
|
153 |
+
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00005.safetensors",
|
154 |
+
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
155 |
+
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00005.safetensors",
|
156 |
+
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00005.safetensors",
|
157 |
+
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
158 |
+
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
159 |
+
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
160 |
+
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
161 |
+
"model.layers.24.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
162 |
+
"model.layers.24.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
163 |
+
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00005.safetensors",
|
164 |
+
"model.layers.24.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
165 |
+
"model.layers.24.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
166 |
+
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00005.safetensors",
|
167 |
+
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00005.safetensors",
|
168 |
+
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00005.safetensors",
|
169 |
+
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00005.safetensors",
|
170 |
+
"model.layers.25.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
171 |
+
"model.layers.25.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
172 |
+
"model.layers.25.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
173 |
+
"model.layers.25.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
174 |
+
"model.layers.25.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
175 |
+
"model.layers.25.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
176 |
+
"model.layers.25.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
177 |
+
"model.layers.25.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
178 |
+
"model.layers.25.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
179 |
+
"model.layers.26.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
180 |
+
"model.layers.26.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
181 |
+
"model.layers.26.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
182 |
+
"model.layers.26.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
183 |
+
"model.layers.26.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
184 |
+
"model.layers.26.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
185 |
+
"model.layers.26.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
186 |
+
"model.layers.26.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
187 |
+
"model.layers.26.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
188 |
+
"model.layers.27.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
189 |
+
"model.layers.27.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
190 |
+
"model.layers.27.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
191 |
+
"model.layers.27.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
192 |
+
"model.layers.27.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
193 |
+
"model.layers.27.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
194 |
+
"model.layers.27.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
195 |
+
"model.layers.27.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
196 |
+
"model.layers.27.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
197 |
+
"model.layers.28.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
198 |
+
"model.layers.28.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
199 |
+
"model.layers.28.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
200 |
+
"model.layers.28.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
201 |
+
"model.layers.28.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
202 |
+
"model.layers.28.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
203 |
+
"model.layers.28.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
204 |
+
"model.layers.28.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
205 |
+
"model.layers.28.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
206 |
+
"model.layers.29.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
207 |
+
"model.layers.29.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
208 |
+
"model.layers.29.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
209 |
+
"model.layers.29.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
210 |
+
"model.layers.29.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
211 |
+
"model.layers.29.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
212 |
+
"model.layers.29.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
213 |
+
"model.layers.29.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
214 |
+
"model.layers.29.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
215 |
+
"model.layers.3.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
216 |
+
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
217 |
+
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
218 |
+
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
219 |
+
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
220 |
+
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
221 |
+
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
222 |
+
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
223 |
+
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
224 |
+
"model.layers.30.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
225 |
+
"model.layers.30.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
226 |
+
"model.layers.30.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
227 |
+
"model.layers.30.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
228 |
+
"model.layers.30.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
229 |
+
"model.layers.30.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
230 |
+
"model.layers.30.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
231 |
+
"model.layers.30.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
232 |
+
"model.layers.30.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
233 |
+
"model.layers.31.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
234 |
+
"model.layers.31.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
235 |
+
"model.layers.31.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
236 |
+
"model.layers.31.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
237 |
+
"model.layers.31.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
238 |
+
"model.layers.31.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
239 |
+
"model.layers.31.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
240 |
+
"model.layers.31.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
241 |
+
"model.layers.31.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
242 |
+
"model.layers.32.input_layernorm.weight": "model-00004-of-00005.safetensors",
|
243 |
+
"model.layers.32.mlp.down_proj.weight": "model-00004-of-00005.safetensors",
|
244 |
+
"model.layers.32.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
245 |
+
"model.layers.32.mlp.up_proj.weight": "model-00004-of-00005.safetensors",
|
246 |
+
"model.layers.32.post_attention_layernorm.weight": "model-00004-of-00005.safetensors",
|
247 |
+
"model.layers.32.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
248 |
+
"model.layers.32.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
249 |
+
"model.layers.32.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
250 |
+
"model.layers.32.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
251 |
+
"model.layers.33.input_layernorm.weight": "model-00005-of-00005.safetensors",
|
252 |
+
"model.layers.33.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
|
253 |
+
"model.layers.33.mlp.gate_proj.weight": "model-00004-of-00005.safetensors",
|
254 |
+
"model.layers.33.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
|
255 |
+
"model.layers.33.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
|
256 |
+
"model.layers.33.self_attn.k_proj.weight": "model-00004-of-00005.safetensors",
|
257 |
+
"model.layers.33.self_attn.o_proj.weight": "model-00004-of-00005.safetensors",
|
258 |
+
"model.layers.33.self_attn.q_proj.weight": "model-00004-of-00005.safetensors",
|
259 |
+
"model.layers.33.self_attn.v_proj.weight": "model-00004-of-00005.safetensors",
|
260 |
+
"model.layers.34.input_layernorm.weight": "model-00005-of-00005.safetensors",
|
261 |
+
"model.layers.34.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
|
262 |
+
"model.layers.34.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
|
263 |
+
"model.layers.34.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
|
264 |
+
"model.layers.34.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
|
265 |
+
"model.layers.34.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
|
266 |
+
"model.layers.34.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
|
267 |
+
"model.layers.34.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
|
268 |
+
"model.layers.34.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
|
269 |
+
"model.layers.35.input_layernorm.weight": "model-00005-of-00005.safetensors",
|
270 |
+
"model.layers.35.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
|
271 |
+
"model.layers.35.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
|
272 |
+
"model.layers.35.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
|
273 |
+
"model.layers.35.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
|
274 |
+
"model.layers.35.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
|
275 |
+
"model.layers.35.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
|
276 |
+
"model.layers.35.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
|
277 |
+
"model.layers.35.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
|
278 |
+
"model.layers.36.input_layernorm.weight": "model-00005-of-00005.safetensors",
|
279 |
+
"model.layers.36.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
|
280 |
+
"model.layers.36.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
|
281 |
+
"model.layers.36.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
|
282 |
+
"model.layers.36.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
|
283 |
+
"model.layers.36.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
|
284 |
+
"model.layers.36.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
|
285 |
+
"model.layers.36.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
|
286 |
+
"model.layers.36.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
|
287 |
+
"model.layers.37.input_layernorm.weight": "model-00005-of-00005.safetensors",
|
288 |
+
"model.layers.37.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
|
289 |
+
"model.layers.37.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
|
290 |
+
"model.layers.37.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
|
291 |
+
"model.layers.37.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
|
292 |
+
"model.layers.37.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
|
293 |
+
"model.layers.37.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
|
294 |
+
"model.layers.37.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
|
295 |
+
"model.layers.37.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
|
296 |
+
"model.layers.38.input_layernorm.weight": "model-00005-of-00005.safetensors",
|
297 |
+
"model.layers.38.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
|
298 |
+
"model.layers.38.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
|
299 |
+
"model.layers.38.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
|
300 |
+
"model.layers.38.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
|
301 |
+
"model.layers.38.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
|
302 |
+
"model.layers.38.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
|
303 |
+
"model.layers.38.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
|
304 |
+
"model.layers.38.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
|
305 |
+
"model.layers.39.input_layernorm.weight": "model-00005-of-00005.safetensors",
|
306 |
+
"model.layers.39.mlp.down_proj.weight": "model-00005-of-00005.safetensors",
|
307 |
+
"model.layers.39.mlp.gate_proj.weight": "model-00005-of-00005.safetensors",
|
308 |
+
"model.layers.39.mlp.up_proj.weight": "model-00005-of-00005.safetensors",
|
309 |
+
"model.layers.39.post_attention_layernorm.weight": "model-00005-of-00005.safetensors",
|
310 |
+
"model.layers.39.self_attn.k_proj.weight": "model-00005-of-00005.safetensors",
|
311 |
+
"model.layers.39.self_attn.o_proj.weight": "model-00005-of-00005.safetensors",
|
312 |
+
"model.layers.39.self_attn.q_proj.weight": "model-00005-of-00005.safetensors",
|
313 |
+
"model.layers.39.self_attn.v_proj.weight": "model-00005-of-00005.safetensors",
|
314 |
+
"model.layers.4.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
315 |
+
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
316 |
+
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
317 |
+
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
318 |
+
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
319 |
+
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
320 |
+
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
321 |
+
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
322 |
+
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
323 |
+
"model.layers.5.input_layernorm.weight": "model-00001-of-00005.safetensors",
|
324 |
+
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00005.safetensors",
|
325 |
+
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
326 |
+
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00005.safetensors",
|
327 |
+
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00005.safetensors",
|
328 |
+
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
329 |
+
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
330 |
+
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
331 |
+
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
332 |
+
"model.layers.6.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
333 |
+
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
334 |
+
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00005.safetensors",
|
335 |
+
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
336 |
+
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
337 |
+
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00005.safetensors",
|
338 |
+
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00005.safetensors",
|
339 |
+
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00005.safetensors",
|
340 |
+
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00005.safetensors",
|
341 |
+
"model.layers.7.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
342 |
+
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
343 |
+
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
344 |
+
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
345 |
+
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
346 |
+
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
347 |
+
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
348 |
+
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
349 |
+
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
350 |
+
"model.layers.8.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
351 |
+
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
352 |
+
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
353 |
+
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
354 |
+
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
355 |
+
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
356 |
+
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
357 |
+
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
358 |
+
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
359 |
+
"model.layers.9.input_layernorm.weight": "model-00002-of-00005.safetensors",
|
360 |
+
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00005.safetensors",
|
361 |
+
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00005.safetensors",
|
362 |
+
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00005.safetensors",
|
363 |
+
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00005.safetensors",
|
364 |
+
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00005.safetensors",
|
365 |
+
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00005.safetensors",
|
366 |
+
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00005.safetensors",
|
367 |
+
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00005.safetensors",
|
368 |
+
"model.norm.weight": "model-00005-of-00005.safetensors"
|
369 |
+
}
|
370 |
+
}
|
output-00001-of-00002.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2d38a31f6e851ccc6e7ea725b0b597041fe20f0772728630e0dacf83c55dce92
|
3 |
+
size 8160394744
|
output-00002-of-00002.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2edae7e345625742e30d25880f5f36f70504299c0817f474dad031a0d502ba17
|
3 |
+
size 528482400
|
pytorch_model.bin.index.json
ADDED
@@ -0,0 +1,370 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"metadata": {
|
3 |
+
"total_size": 24495564800
|
4 |
+
},
|
5 |
+
"weight_map": {
|
6 |
+
"lm_head.weight": "pytorch_model-00005-of-00005.bin",
|
7 |
+
"model.embed_tokens.weight": "pytorch_model-00001-of-00005.bin",
|
8 |
+
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
9 |
+
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
|
10 |
+
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
|
11 |
+
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
|
12 |
+
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
13 |
+
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
|
14 |
+
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
|
15 |
+
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
|
16 |
+
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
|
17 |
+
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
18 |
+
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
|
19 |
+
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
|
20 |
+
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
|
21 |
+
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
22 |
+
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
|
23 |
+
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
|
24 |
+
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
|
25 |
+
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
|
26 |
+
"model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
27 |
+
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
|
28 |
+
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
|
29 |
+
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
|
30 |
+
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
31 |
+
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
|
32 |
+
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
|
33 |
+
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
|
34 |
+
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
|
35 |
+
"model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
36 |
+
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
|
37 |
+
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
|
38 |
+
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
|
39 |
+
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
40 |
+
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
|
41 |
+
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
|
42 |
+
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
|
43 |
+
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
|
44 |
+
"model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
45 |
+
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
|
46 |
+
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
|
47 |
+
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
|
48 |
+
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
49 |
+
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
|
50 |
+
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
|
51 |
+
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
|
52 |
+
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
|
53 |
+
"model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
54 |
+
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
|
55 |
+
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
|
56 |
+
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
|
57 |
+
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
58 |
+
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
|
59 |
+
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
|
60 |
+
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
|
61 |
+
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
|
62 |
+
"model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
63 |
+
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
|
64 |
+
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
|
65 |
+
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
|
66 |
+
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
67 |
+
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
|
68 |
+
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
|
69 |
+
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
|
70 |
+
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
|
71 |
+
"model.layers.15.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
72 |
+
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
|
73 |
+
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
|
74 |
+
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
|
75 |
+
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
76 |
+
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
|
77 |
+
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
|
78 |
+
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
|
79 |
+
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
|
80 |
+
"model.layers.16.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
81 |
+
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
|
82 |
+
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
|
83 |
+
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
|
84 |
+
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
85 |
+
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
|
86 |
+
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
|
87 |
+
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
|
88 |
+
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
|
89 |
+
"model.layers.17.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
90 |
+
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
|
91 |
+
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
|
92 |
+
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
|
93 |
+
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
94 |
+
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
|
95 |
+
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
|
96 |
+
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
|
97 |
+
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
|
98 |
+
"model.layers.18.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
99 |
+
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
|
100 |
+
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
|
101 |
+
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
|
102 |
+
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
103 |
+
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
|
104 |
+
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
|
105 |
+
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
|
106 |
+
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
|
107 |
+
"model.layers.19.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
108 |
+
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
|
109 |
+
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
|
110 |
+
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
|
111 |
+
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
112 |
+
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
|
113 |
+
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
|
114 |
+
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
|
115 |
+
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
|
116 |
+
"model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
117 |
+
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
|
118 |
+
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
|
119 |
+
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
|
120 |
+
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
121 |
+
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
|
122 |
+
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
|
123 |
+
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
|
124 |
+
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
|
125 |
+
"model.layers.20.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
126 |
+
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
|
127 |
+
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
|
128 |
+
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
|
129 |
+
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
130 |
+
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
|
131 |
+
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
|
132 |
+
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
|
133 |
+
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
|
134 |
+
"model.layers.21.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
135 |
+
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
|
136 |
+
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
|
137 |
+
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
|
138 |
+
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
139 |
+
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
|
140 |
+
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
|
141 |
+
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
|
142 |
+
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
|
143 |
+
"model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
144 |
+
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
|
145 |
+
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
|
146 |
+
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
|
147 |
+
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
148 |
+
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
|
149 |
+
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
|
150 |
+
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
|
151 |
+
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
|
152 |
+
"model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
153 |
+
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00005.bin",
|
154 |
+
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
|
155 |
+
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00005.bin",
|
156 |
+
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00005.bin",
|
157 |
+
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
|
158 |
+
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
|
159 |
+
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
|
160 |
+
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
|
161 |
+
"model.layers.24.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
162 |
+
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
|
163 |
+
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00005.bin",
|
164 |
+
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
|
165 |
+
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
166 |
+
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00005.bin",
|
167 |
+
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00005.bin",
|
168 |
+
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00005.bin",
|
169 |
+
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00005.bin",
|
170 |
+
"model.layers.25.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
171 |
+
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
|
172 |
+
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
|
173 |
+
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
|
174 |
+
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
175 |
+
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
|
176 |
+
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
|
177 |
+
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
|
178 |
+
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
|
179 |
+
"model.layers.26.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
180 |
+
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
|
181 |
+
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
|
182 |
+
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
|
183 |
+
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
184 |
+
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
|
185 |
+
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
|
186 |
+
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
|
187 |
+
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
|
188 |
+
"model.layers.27.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
189 |
+
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
|
190 |
+
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
|
191 |
+
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
|
192 |
+
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
193 |
+
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
|
194 |
+
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
|
195 |
+
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
|
196 |
+
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
|
197 |
+
"model.layers.28.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
198 |
+
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
|
199 |
+
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
|
200 |
+
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
|
201 |
+
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
202 |
+
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
|
203 |
+
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
|
204 |
+
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
|
205 |
+
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
|
206 |
+
"model.layers.29.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
207 |
+
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
|
208 |
+
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
|
209 |
+
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
|
210 |
+
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
211 |
+
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
|
212 |
+
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
|
213 |
+
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
|
214 |
+
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
|
215 |
+
"model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
216 |
+
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
|
217 |
+
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
|
218 |
+
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
|
219 |
+
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
220 |
+
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
|
221 |
+
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
|
222 |
+
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
|
223 |
+
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
|
224 |
+
"model.layers.30.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
225 |
+
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
|
226 |
+
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
|
227 |
+
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
|
228 |
+
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
229 |
+
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
|
230 |
+
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
|
231 |
+
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
|
232 |
+
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
|
233 |
+
"model.layers.31.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
234 |
+
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
|
235 |
+
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
|
236 |
+
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
|
237 |
+
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
238 |
+
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
|
239 |
+
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
|
240 |
+
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
|
241 |
+
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
|
242 |
+
"model.layers.32.input_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
243 |
+
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00004-of-00005.bin",
|
244 |
+
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
|
245 |
+
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00004-of-00005.bin",
|
246 |
+
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00004-of-00005.bin",
|
247 |
+
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
|
248 |
+
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
|
249 |
+
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
|
250 |
+
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
|
251 |
+
"model.layers.33.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
252 |
+
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
|
253 |
+
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00004-of-00005.bin",
|
254 |
+
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
|
255 |
+
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
256 |
+
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-00004-of-00005.bin",
|
257 |
+
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00004-of-00005.bin",
|
258 |
+
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-00004-of-00005.bin",
|
259 |
+
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-00004-of-00005.bin",
|
260 |
+
"model.layers.34.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
261 |
+
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
|
262 |
+
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
|
263 |
+
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
|
264 |
+
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
265 |
+
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
|
266 |
+
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
|
267 |
+
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
|
268 |
+
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
|
269 |
+
"model.layers.35.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
270 |
+
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
|
271 |
+
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
|
272 |
+
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
|
273 |
+
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
274 |
+
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
|
275 |
+
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
|
276 |
+
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
|
277 |
+
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
|
278 |
+
"model.layers.36.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
279 |
+
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
|
280 |
+
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
|
281 |
+
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
|
282 |
+
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
283 |
+
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
|
284 |
+
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
|
285 |
+
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
|
286 |
+
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
|
287 |
+
"model.layers.37.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
288 |
+
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
|
289 |
+
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
|
290 |
+
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
|
291 |
+
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
292 |
+
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
|
293 |
+
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
|
294 |
+
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
|
295 |
+
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
|
296 |
+
"model.layers.38.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
297 |
+
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
|
298 |
+
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
|
299 |
+
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
|
300 |
+
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
301 |
+
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
|
302 |
+
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
|
303 |
+
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
|
304 |
+
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
|
305 |
+
"model.layers.39.input_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
306 |
+
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00005-of-00005.bin",
|
307 |
+
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00005-of-00005.bin",
|
308 |
+
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00005-of-00005.bin",
|
309 |
+
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00005-of-00005.bin",
|
310 |
+
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-00005-of-00005.bin",
|
311 |
+
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00005-of-00005.bin",
|
312 |
+
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-00005-of-00005.bin",
|
313 |
+
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-00005-of-00005.bin",
|
314 |
+
"model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
315 |
+
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
|
316 |
+
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
|
317 |
+
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
|
318 |
+
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
319 |
+
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
|
320 |
+
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
|
321 |
+
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
|
322 |
+
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
|
323 |
+
"model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
324 |
+
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00005.bin",
|
325 |
+
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
|
326 |
+
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00005.bin",
|
327 |
+
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00005.bin",
|
328 |
+
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
|
329 |
+
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
|
330 |
+
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
|
331 |
+
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
|
332 |
+
"model.layers.6.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
333 |
+
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
|
334 |
+
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00005.bin",
|
335 |
+
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
|
336 |
+
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
337 |
+
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00005.bin",
|
338 |
+
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00005.bin",
|
339 |
+
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00005.bin",
|
340 |
+
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00005.bin",
|
341 |
+
"model.layers.7.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
342 |
+
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
|
343 |
+
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
|
344 |
+
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
|
345 |
+
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
346 |
+
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
|
347 |
+
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
|
348 |
+
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
|
349 |
+
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
|
350 |
+
"model.layers.8.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
351 |
+
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
|
352 |
+
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
|
353 |
+
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
|
354 |
+
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
355 |
+
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
|
356 |
+
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
|
357 |
+
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
|
358 |
+
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
|
359 |
+
"model.layers.9.input_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
360 |
+
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00002-of-00005.bin",
|
361 |
+
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00002-of-00005.bin",
|
362 |
+
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00002-of-00005.bin",
|
363 |
+
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00002-of-00005.bin",
|
364 |
+
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00002-of-00005.bin",
|
365 |
+
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00002-of-00005.bin",
|
366 |
+
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00002-of-00005.bin",
|
367 |
+
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00002-of-00005.bin",
|
368 |
+
"model.norm.weight": "pytorch_model-00005-of-00005.bin"
|
369 |
+
}
|
370 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"additional_special_tokens": [
|
3 |
+
{
|
4 |
+
"content": "<|im_start|>",
|
5 |
+
"lstrip": false,
|
6 |
+
"normalized": false,
|
7 |
+
"rstrip": false,
|
8 |
+
"single_word": false
|
9 |
+
}
|
10 |
+
],
|
11 |
+
"bos_token": {
|
12 |
+
"content": "<s>",
|
13 |
+
"lstrip": false,
|
14 |
+
"normalized": false,
|
15 |
+
"rstrip": false,
|
16 |
+
"single_word": false
|
17 |
+
},
|
18 |
+
"eos_token": {
|
19 |
+
"content": "<|im_end|>",
|
20 |
+
"lstrip": false,
|
21 |
+
"normalized": false,
|
22 |
+
"rstrip": false,
|
23 |
+
"single_word": false
|
24 |
+
},
|
25 |
+
"pad_token": {
|
26 |
+
"content": "<pad>",
|
27 |
+
"lstrip": false,
|
28 |
+
"normalized": false,
|
29 |
+
"rstrip": false,
|
30 |
+
"single_word": false
|
31 |
+
},
|
32 |
+
"unk_token": {
|
33 |
+
"content": "<unk>",
|
34 |
+
"lstrip": false,
|
35 |
+
"normalized": false,
|
36 |
+
"rstrip": false,
|
37 |
+
"single_word": false
|
38 |
+
}
|
39 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|