bartowski commited on
Commit
2fe3f25
1 Parent(s): 0038535

Quant for 5.0

Browse files
README.md CHANGED
@@ -4,62 +4,387 @@ library_name: transformers
4
  datasets:
5
  - Severian/Internal-Knowledge-Map
6
  pipeline_tag: text-generation
7
- quantized_by: bartowski
8
  ---
9
 
10
- ## Exllama v2 Quantizations of Nexus-IKM-Mistral-7B
11
 
12
- Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.15">turboderp's ExLlamaV2 v0.0.15</a> for quantization.
13
 
14
- <b>The "main" branch only contains the measurement.json, download one of the other branches for the model (see below)</b>
15
 
16
- Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
17
 
18
- Original model: https://huggingface.co/Severian/Nexus-IKM-Mistral-7B
19
 
20
- | Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
21
- | ----- | ---- | ------- | ------ | ------ | ------ | ------------ |
22
- | [8_0](https://huggingface.co/bartowski/Nexus-IKM-Mistral-7B-exl2/tree/8_0) | 8.0 | 8.0 | 8.4 GB | 9.8 GB | 11.8 GB | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
23
- | [6_5](https://huggingface.co/bartowski/Nexus-IKM-Mistral-7B-exl2/tree/6_5) | 6.5 | 8.0 | 7.2 GB | 8.6 GB | 10.6 GB | Very similar to 8.0, good tradeoff of size vs performance, **recommended**. |
24
- | [5_0](https://huggingface.co/bartowski/Nexus-IKM-Mistral-7B-exl2/tree/5_0) | 5.0 | 6.0 | 6.0 GB | 7.4 GB | 9.4 GB | Slightly lower quality vs 6.5, but usable on 8GB cards. |
25
- | [4_25](https://huggingface.co/bartowski/Nexus-IKM-Mistral-7B-exl2/tree/4_25) | 4.25 | 6.0 | 5.3 GB | 6.7 GB | 8.7 GB | GPTQ equivalent bits per weight, slightly higher quality. |
26
- | [3_5](https://huggingface.co/bartowski/Nexus-IKM-Mistral-7B-exl2/tree/3_5) | 3.5 | 6.0 | 4.7 GB | 6.1 GB | 8.1 GB | Lower quality, only use if you have to. |
27
 
28
- ## Download instructions
 
29
 
30
- With git:
 
31
 
32
- ```shell
33
- git clone --single-branch --branch 6_5 https://huggingface.co/bartowski/Nexus-IKM-Mistral-7B-exl2 Nexus-IKM-Mistral-7B-exl2-6_5
34
- ```
35
 
36
- With huggingface hub (credit to TheBloke for instructions):
37
 
38
- ```shell
39
- pip3 install huggingface-hub
40
- ```
41
 
42
- To download the `main` (only useful if you only care about measurement.json) branch to a folder called `Nexus-IKM-Mistral-7B-exl2`:
43
 
44
- ```shell
45
- mkdir Nexus-IKM-Mistral-7B-exl2
46
- huggingface-cli download bartowski/Nexus-IKM-Mistral-7B-exl2 --local-dir Nexus-IKM-Mistral-7B-exl2 --local-dir-use-symlinks False
47
- ```
48
 
49
- To download from a different branch, add the `--revision` parameter:
50
 
51
- Linux:
52
 
53
- ```shell
54
- mkdir Nexus-IKM-Mistral-7B-exl2-6_5
55
- huggingface-cli download bartowski/Nexus-IKM-Mistral-7B-exl2 --revision 6_5 --local-dir Nexus-IKM-Mistral-7B-exl2-6_5 --local-dir-use-symlinks False
56
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- Windows (which apparently doesn't like _ in folders sometimes?):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
- ```shell
61
- mkdir Nexus-IKM-Mistral-7B-exl2-6.5
62
- huggingface-cli download bartowski/Nexus-IKM-Mistral-7B-exl2 --revision 6_5 --local-dir Nexus-IKM-Mistral-7B-exl2-6.5 --local-dir-use-symlinks False
63
  ```
64
 
65
- Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  datasets:
5
  - Severian/Internal-Knowledge-Map
6
  pipeline_tag: text-generation
 
7
  ---
8
 
9
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/64740cf7485a7c8e1bd51ac9/GO4MY_3adP2G9EHKZbZpg.webp" width="500" height="500">
10
 
 
11
 
12
+ This model is the second trained with experimental 'Internal Knowledge Map' dataset. Developed with an aim to go beyond the scope of usual data processing capabilities, this model gets trained to build comprehensive understanding and reasoning in a wide range of knowledge domains with elaborate guidelines. It bases its reasoning on a specially selected dataset emphasizing the interrelations of the diverse disciplines which aim to synthesize, integrate, and apply complex information in ways that mimic humanly abstract reasoning and creative thought processes.
13
 
14
+ At the very core of the development of this model is the desire to make sure that LLMs engage in a kind of cognitive activity not limited to memory but actually taking on abstract reasoning, problem-solving, and generation of new insights. To achieve this, 'Nexus-IKM-Mistral-7B' has been fine-tuned until convergance using a novel Phased Training appraoch on this unique dataset, which resulted in the model demonstrating greater capability for giving rise to insights and problem-solving in complex, multi-disciplinary settings. This involves improved ability in drawing links between different pieces of knowledge, reasoning through complex scenarios, and proposing innovative solutions that cut across various domains, including science, technology, environmental studies, and humanities.
15
 
16
+ Test this out and see if you find anything interesting or intriguing. I will keep iterating more versions but this one seems like a fun and useful way to start.
17
 
18
+ ---
 
 
 
 
 
 
19
 
20
+ # Phased Training Methodology
21
+ Leveraging this dataset, we've adopted a phased training methodology that focuses sequentially on different dataset components, namely "System" and "Instruction," across separate training phases. This approach allows models to build layered understandings from general systemic insights to specific instructional cues, enriching their generative output with both broad contextual awareness and detailed, topic-specific knowledge.
22
 
23
+ **Phase 1: System Focus**
24
+ In the initial phase, the model concentrates on the "System" component, absorbing overarching guidelines and objectives. This phase lays the foundational understanding, enabling the model to grasp the contextual framework and systemic knowledge encapsulated in the dataset.
25
 
26
+ **Phase 2: Instruction Focus**
27
+ Building upon the systemic knowledge, the second phase shifts the model's focus to the "Instructions" component. This phase sharpens the model's ability to interpret specific prompts and generate responses that are not only informed by the broader context but also precisely tailored to the instructional cues.
 
28
 
29
+ ## GGUF Q8 Version: https://huggingface.co/Severian/Nexus-IKM-Mistral-7B-GGUF
30
 
 
 
 
31
 
32
+ **If you'd like to train your own version, here is the full notebook to recreate the training on Unsloth yourself (https://colab.research.google.com/drive/1828t77iO2nLRXVfB8HoI11eFu-79-Oe7?usp=sharing). You'll just have to drop in the train.jsonl from the Dataset repo (https://huggingface.co/datasets/Severian/Internal-Knowledge-Map) into your Colab directory and rename it dataset.jsonl**
33
 
34
+ ---
 
 
 
35
 
 
36
 
 
37
 
38
+ # Example Outputs vs Base model
39
+
40
+
41
+ ## **How can everyday household items be repurposed to solve common gardening problems?**
42
+
43
+ ### Base Mistral-Instruct-v0.2:
44
+
45
+ There are several ways everyday household items can be repurposed to solve common gardening problems. Here are some ideas:
46
+
47
+ 1. Plastic Bottles: Cut off the top of a plastic bottle and bury it with the bottom in the ground, upside down. This will act as a water reservoir for your plants, helping them stay hydrated during dry spells.
48
+ 2. Egg Cartons: Use egg cartons to start seedlings indoors. Once the seedlings have grown large enough, you can transplant them directly into the ground, carton and all. The carton will decompose over time, leaving the seedling's roots undisturbed during the transfer.
49
+ 3. Coffee Grounds: Use coffee grounds as a fertilizer for acid-loving plants such as azaleas, rhododendrons, and hydrangeas. Coffee grounds also help deter pests like slugs and snails.
50
+ 4. Newspaper: Use newspaper to suppress weeds. Lay down several layers of newspaper over the weeded area, then cover it with a layer of mulch or compost. Over time, the newspaper will decompose and the weeds will not be able to grow through.
51
+ 5. Baking Soda: Use baking soda as a fungicide for roses and other plants susceptible to powdery mildew. Mix 1 tablespoon of baking soda with 1 quart of water, then spray the solution onto the infected areas. Be sure not to overuse, as too much baking soda can harm the plant.
52
+ 6. Empty Cans: Puncture the bottoms of empty tin or aluminum cans and bury them in the garden. The cans will act as self-watering planters for herbs or small vegetables like tomatoes. Fill the cans with soil, plant your seeds, then water the can from the top to allow the water to seep out through the bottom and slowly irrigate the roots.
53
+ 7. Glass Jars: Use glass jars to create DIY terrariums. Fill the jar halfway with pebbles or small stones for drainage, then add a layer of soil and your favorite plants. The sealed jar will create a mini greenhouse effect, allowing the plants to thrive in humid conditions.
54
+ 8. Old Tires: Repurpose old tires into raised garden beds by filling them with soil and compost. This is an excellent solution for creating additional growing space without breaking the bank. Be sure to drill several holes in the tire to allow water to drain properly.
55
+
56
+ ### Nexus-7B:
57
+
58
+ Repurposing everyday household items for gardening can be an innovative and cost-effective solution to common gardening problems. Here are some examples of how everyday items can be transformed into useful gardening tools or solutions:
59
+
60
+ 1. Plastic Bottles as Waterers: Cut the bottom off a plastic bottle, poke several small holes in the sides for drainage, and turn it upside down to water your plants. This can be used to water hanging baskets, potted plants, or even in-ground flowers without overwatering them.
61
+
62
+ 2. CDs as Sun Deterrents: Place old CDs around the perimeter of your garden to reflect sunlight and discourage birds from feeding on ripe fruits or vegetables. The bright reflections can also serve as an aesthetic enhancement for your garden.
63
+
64
+ 3. Old Tires as Planters: Stack tires in a sunny location, fill them with soil, and add plants. This repurposed tire planter is ideal for growing larger plants such as tomatoes, cucumbers, or squash. Ensure that the tires are stable and secure before filling with soil.
65
+
66
+ 4. Newspapers as Weed Blockers: Cover the ground around young seedlings or plants with old newspapers to prevent weed growth. This method is particularly useful in gardens where manual weeding is not practical due to the size of the area or physical limitations.
67
+
68
+ 5. Egg Cartons as Seed Starters: Fill each cup of an egg carton with soil and plant seeds, watering them gently. Once the seedlings are large enough, you can transfer the entire carton into a larger pot. This method allows you to start seeds indoors using minimal space and resources.
69
+
70
+ 6. Tin Cans as Seedling Protectors: Cut the bottom off an empty tin can, paint it with non-toxic, weather-resistant paint, and poke several drainage holes in the sides. Once painted and sealed, place the can over young seedlings to protect them from frost or excessive sunlight.
71
+
72
+ 7. Toilet Paper Rolls as Seedling Markers: Write plant names on toilet paper rolls with a waterproof marker, then stick them into the soil next to their corresponding seedlings. This method helps keep track of plants during early growth stages.
73
+
74
+ 8. Coffee Grounds as Fertilizer: Spread coffee grounds around acid-loving plants such as blueberries and tomatoes to add nitrogen to the soil. The addition of coffee grounds also improves soil structure and deters pests.
75
+
76
+ 9. Old Pots as Drainage Hacks: Drill holes in the bottoms of old plastic or ceramic pots and bury them in the ground around larger plants or trees. These buried pots will serve as water reservoirs, providing additional moisture to the roots when natural rainfall is scarce.
77
+
78
+ 10. Kitchen Waste as Compost: Collect fruit and vegetable waste, along with coffee grounds and eggshells, in a designated compost bin. Decompose these items over time to create nutrient-rich soil for gardening, saving money on commercial fertilizers.
79
+
80
+ ### Evaluation
81
+
82
+ ### Depth and Specificity
83
+ - **Base Model**: Provides straightforward, commonly known uses for household items in gardening, like using plastic bottles for water reservoirs or egg cartons for seedlings. It covers a broad range of solutions but with a focus on practical, direct applications.
84
+ - **Nexus-7B**: Delves into more innovative and less commonly known uses, such as using CDs as sun deterrents or tin cans as seedling protectors. This response demonstrates a deeper understanding and creative thinking in repurposing items for gardening challenges.
85
+
86
+ ### Integration of Knowledge
87
+ - **Base Model**: Lists uses in a manner that aligns with general knowledge, lacking in the integration of broader environmental or ecological considerations.
88
+ - **Nexus-7B**: Shows an integrated approach, suggesting uses that consider not just the gardening problem at hand but also potential aesthetic enhancements and ecological benefits, such as using coffee grounds as fertilizer for its nitrogen content and soil structure improvement.
89
+
90
+ ### Innovation and Creativity
91
+ - **Base Model**: Sticks to more traditional and widely recommended practices. While useful, these solutions do not push the boundary of creative repurposing.
92
+ - **Nexus-7B**: Exhibits a higher degree of creativity, especially in solutions like using CDs for bird deterrence or creating a mini-greenhouse effect with glass jars. These suggestions indicate a leap in applying lateral thinking to gardening problems.
93
+
94
+ ### Environmental and Ecological Consideration
95
+ - **Nexus-7B**: Implicitly suggests a greater awareness of environmental impact by promoting composting and the reuse of materials in ways that benefit the garden's ecosystem, showing an undercurrent of sustainability that's not as pronounced in the base model's response.
96
+
97
+ ### How to Spot Nuances:
98
+ 1. **Look for Creativity**: The Nexus-7B model's responses are more creative and less conventional, suggesting a deeper dive into each item's potential uses.
99
+ 2. **Consider the Depth of Integration**: Nexus-7B integrates broader environmental and sustainability considerations, moving beyond the immediate utility to consider the impact on the garden's ecosystem.
100
+ 3. **Assess the Innovation Level**: Responses that suggest new or unconventional uses demonstrate the model's ability to synthesize information creatively and offer innovative solutions.
101
+ 4. **Evaluate Environmental Consciousness**: Nexus-7B's approach reflects a subtle yet significant emphasis on sustainability, indicating an advanced integration of ecological awareness into problem-solving strategies.
102
+
103
+ ---
104
 
105
+ ## **Most people recognize bees for their role in pollination, but their contributions to ecosystems extend far beyond that. Can you elaborate on the less commonly known yet crucial roles that bees play in supporting biodiversity and ecosystem health?**
106
+
107
+ ### Base:
108
+
109
+ Absolutely! Bees are essential pollinators, but they contribute significantly to various aspects of ecosystem health and biodiversity in ways that are not always recognized. Here are some less commonly known roles bees play in supporting ecosystems:
110
+
111
+ 1. Nectar and pollen collectors: Bees collect nectar and pollen from flowers to feed themselves and their hives. In doing so, they help transport nutrients from the plants to other organisms, creating a complex web of connections between different species. This not only supports the growth and survival of plants but also ensures food sources for various animals that rely on those plants.
112
+ 2. Decomposers: Bees play a role in decomposition by regurgitating undigested parts of flowers, which are rich in pollen and nectar, as food for their larvae. This process helps break down plant material and recycle nutrients back into the soil, contributing to soil health and fertility.
113
+ 3. Pest control: Bees contribute to pest control by attracting various natural enemies of insects that damage crops or other plants. For example, bees help pollinate flowers that produce parasitic wasps, which in turn prey on harmful aphids and other pests. By supporting the populations of beneficial insects, bees indirectly protect crops and other vegetation from damaging pests.
114
+ 4. Seed dispersal: Some bees play a role in seed dispersal by unintentionally carrying seeds on their bodies as they move between flowers. This helps ensure the distribution and growth of various plant species and contributes to ecosystem diversity.
115
+ 5. Honey production: Bees produce honey, which has numerous uses for humans beyond just consuming it as food. Honey has antibacterial properties and can be used as a natural sweetener in cooking or as an ingredient in various medicinal products. In addition, the production of honey creates economic opportunities for beekeepers and supports local communities.
116
+ 6. Nutrient cycling: Bees contribute to nutrient cycling by transferring pollen from one flower to another during the pollination process. This helps ensure that plants are able to produce seeds, which in turn allows for new plant growth and contributes to ecosystem health and productivity.
117
+ 7. Carbon sequestration: Bees play a role in carbon sequestration through their interactions with various plant species. By facilitating pollination and supporting the growth of vegetation, bees contribute to the absorption and storage of carbon dioxide from the atmosphere. This helps mitigate the effects of climate change and supports overall ecosystem health.
118
+ 8. Supporting food webs: Bees contribute to complex food webs by providing essential resources for various organisms. For example, bee larvae serve as food sources for birds and other predators, while adult bees provide nectar and pollen for a wide variety of insects and other animals. This helps maintain the balance and diversity of various ecosystems.
119
+
120
+
121
+ ### Nexus:
122
+
123
+ Absolutely! While bees are primarily known for their role in pollination, their contributions to ecosystems encompass a much broader range of functions that support biodiversity and ecological health. Here are some of the less commonly recognized yet crucial roles bees play:
124
+
125
+ 1. Seed Dispersal: Bees not only transfer pollen between flowers but also unintentionally pick up and distribute seeds as they forage, aiding in the dispersal of plant species. This is particularly important for plants that have small seeds or rely on insects for seed dispersal.
126
+
127
+ 2. Nitrogen Fixation: Some bee species, like leafcutter bees, play a role in nitrogen fixation by collecting plant material and transporting it back to their nests. This process improves soil health, benefiting the growth of nearby plants.
128
+
129
+ 3. Soil Aeration: Bees contribute to soil aeration through their foraging activities. As they excavate burrows for nesting or collect soil for building cells in their nests, they create small tunnels that improve soil structure and promote water retention.
130
+
131
+ 4. Food Source for Other Organisms: Beyond pollination, bees provide food for a wide range of organisms, including birds, mammals, and other insects. This makes them crucial components of complex food webs within ecosystems.
132
+
133
+ 5. Biological Pest Control: Bees also contribute to biocontrol by preying on pests that threaten crops or natural ecosystems. For example, parasitic bees and wasps lay their eggs in the larvae of various pests, thereby reducing their populations.
134
+
135
+ 6. Medicinal and Pharmaceutical Uses: Bees play a significant role in producing honey and beeswax, which have medicinal and pharmaceutical applications. For instance, honey has antibacterial properties that can help treat wounds, and beeswax is used as a base for cosmetics and in the manufacturing of candles.
136
+
137
+ 7. Biodiversity Indicators: Bees serve as valuable indicators of biodiversity health. Their distribution and abundance reflect the overall health of ecosystems and can signal changes in environmental conditions that may affect other species within their habitats.
138
+
139
+ 8. Cultural and Spiritual Significance: Bees have cultural and spiritual significance in many societies around the world. They are celebrated for their role in agriculture and their contribution to honey production, which is used in various traditional practices and religious ceremonies.
140
+
141
+ In summary, bees contribute significantly to ecosystems beyond pollination by enhancing soil health, supporting food webs, providing resources for medicinal and pharmaceutical applications, serving as indicators of biodiversity, and playing cultural and spiritual roles within communities. These diverse functions highlight the crucial role bees play in maintaining ecological balance and promoting sustainable agriculture.
142
+
143
+
144
+ ### Evaluation
145
+
146
+ ### Depth of Information
147
+ - **Base Model**: Offers a broad overview of bees' roles, touching on common but critical functions like nutrient cycling, honey production, and seed dispersal. It provides general explanations that align with widely known facts about bees.
148
+ - **Nexus-7B**: Delves deeper into specific and less commonly recognized contributions of bees, such as their role in nitrogen fixation and soil aeration, showcasing a nuanced understanding of ecological processes.
149
+
150
+ ### Breadth and Specificity of Contributions
151
+ - **Base Model**: Lists a variety of bees' ecological roles with a focus on their immediate environmental impacts, such as decomposers and participants in pest control. It effectively communicates bees' contributions but within a relatively conventional scope.
152
+ - **Nexus-7B**: Expands the conversation to include specific ecological functions, such as bees' contribution to soil aeration and their significance in biocontrol, presenting a more comprehensive picture of bees’ ecological roles. This model's response implies a deeper integration of ecological knowledge, highlighting less obvious but crucial ecosystem services provided by bees.
153
+
154
+ ### Innovative and Unique Insights
155
+ - **Base Model**: While informative, the response sticks to more traditional explanations of bees' ecological roles without introducing novel insights or making connections to broader ecological or societal impacts.
156
+ - **Nexus-7B**: Introduces innovative aspects of bees’ ecological contributions, like their role in nitrogen fixation and as indicators of biodiversity health, providing insights that reflect a layered understanding of ecosystems. It also notes the cultural and spiritual significance of bees, bridging ecological roles with societal impacts.
157
+
158
+ ### How to Spot Nuances and Understand Impact
159
+ 1. **Depth and Detail**: Look for responses that provide detailed explanations of processes and mechanisms, like Nexus-7B’s mention of nitrogen fixation, which indicates a deeper understanding of the subject matter.
160
+ 2. **Breadth of Ecological Roles**: Assess the range of ecological roles discussed. Nexus-7B’s mention of soil aeration and biological pest control shows a broader perspective on bees’ environmental impact.
161
+ 3. **Integration of Knowledge**: Notice how different pieces of information are woven together to paint a comprehensive picture. Nexus-7B’s response integrates bees' ecological functions with their importance in cultural and spiritual contexts, demonstrating an advanced synthesis of information.
162
+ 4. **Uncommon Insights**: Pay attention to unique or less commonly discussed roles that bees play, as highlighted by Nexus-7B. These insights can reveal the model's ability to go beyond surface-level knowledge and explore complex ecological dynamics.
163
+
164
+
165
+ ---
166
+
167
+ ## Training Snapshot
168
 
 
 
 
169
  ```
170
 
171
+ Step Training Loss
172
+ 1 3.223000
173
+ 2 3.221300
174
+ 3 3.215900
175
+ 4 3.210600
176
+ 5 3.203000
177
+ 6 3.193500
178
+ 7 3.184000
179
+ 8 3.173400
180
+ 9 3.162400
181
+ 10 3.151500
182
+ 11 3.140500
183
+ 12 3.128800
184
+ 13 3.117600
185
+ 14 3.106700
186
+ 15 3.095500
187
+ 16 3.084700
188
+ 17 3.073700
189
+ 18 3.062700
190
+ 19 3.052300
191
+ 20 3.041800
192
+
193
+
194
+ 201 1.273200
195
+ 202 1.257600
196
+ 203 1.241900
197
+ 204 1.226100
198
+ 205 1.210800
199
+ 206 1.195500
200
+ 207 1.180800
201
+ 208 1.166000
202
+ 209 1.151200
203
+ 210 1.136900
204
+ 211 1.122000
205
+ 212 1.106600
206
+ 213 1.091200
207
+ 214 1.075200
208
+ 215 1.059200
209
+ 216 1.042900
210
+ 217 1.026600
211
+ 218 1.010300
212
+ 219 0.994200
213
+
214
+ 416 0.041700
215
+ 417 0.041700
216
+ 418 0.041600
217
+ 419 0.041600
218
+ 420 0.041600
219
+ 421 0.041600
220
+ 422 0.041500
221
+ 423 0.041500
222
+ 424 0.041500
223
+ 425 0.041400
224
+ 426 0.041400
225
+ 427 0.041400
226
+ 428 0.041400
227
+ 429 0.041300
228
+ 430 0.041300
229
+ 431 0.041300
230
+ 432 0.041200
231
+ 433 0.041200
232
+ 434 0.041200
233
+ 435 0.041100
234
+ 436 0.041200
235
+ 437 0.041100
236
+ 438 0.041100
237
+ 439 0.041100
238
+ 440 0.041000
239
+ 441 0.041000
240
+ 442 0.041000
241
+ 443 0.040900
242
+ 444 0.040900
243
+ 445 0.040900
244
+
245
+ 668 0.035200
246
+ 669 0.035100
247
+ 670 0.035100
248
+ 671 0.035100
249
+ 672 0.035100
250
+ 673 0.035000
251
+ 674 0.035000
252
+ 675 0.035000
253
+ 676 0.035000
254
+ 677 0.034900
255
+ 678 0.034900
256
+ 679 0.034900
257
+ 680 0.034800
258
+ 681 0.034800
259
+ 682 0.034800
260
+ 683 0.034800
261
+ 684 0.034800
262
+ 685 0.034700
263
+ 686 0.034700
264
+ 687 0.034700
265
+ 688 0.034700
266
+ 689 0.034600
267
+ 690 0.034600
268
+ 691 0.034600
269
+ 692 0.034600
270
+ 693 0.034500
271
+ 694 0.034500
272
+ 695 0.034500
273
+ 696 0.034400
274
+ 697 0.034400
275
+ 698 0.034400
276
+ 699 0.034400
277
+ 700 0.034300
278
+ 701 0.034300
279
+ 702 0.034300
280
+ 703 0.034300
281
+ 704 0.034200
282
+ 705 0.034200
283
+ 706 0.034200
284
+ 707 0.034200
285
+ 708 0.034100
286
+ 709 0.034100
287
+ 710 0.034100
288
+ 711 0.034100
289
+ 712 0.034000
290
+ 713 0.034000
291
+ 714 0.034000
292
+ 715 0.034000
293
+ 716 0.033900
294
+ 717 0.033900
295
+ 718 0.033800
296
+ 719 0.033800
297
+ 720 0.033800
298
+ 721 0.033800
299
+
300
+ 1209 0.006600
301
+ 1210 0.006500
302
+ 1211 0.006300
303
+ 1212 0.006200
304
+ 1213 0.006100
305
+ 1214 0.006000
306
+ 1215 0.005800
307
+ 1216 0.005700
308
+ 1217 0.005600
309
+ 1218 0.005500
310
+ 1219 0.005400
311
+ 1220 0.005300
312
+ 1221 0.005100
313
+ 1222 0.004900
314
+ 1223 0.004800
315
+ 1224 0.004700
316
+ 1225 0.004600
317
+ 1226 0.004500
318
+ 1227 0.004400
319
+ 1228 0.004300
320
+ 1229 0.004200
321
+ 1230 0.004000
322
+ 1231 0.003900
323
+ 1232 0.003800
324
+ 1233 0.003700
325
+ 1234 0.003500
326
+ 1235 0.003400
327
+ 1236 0.003300
328
+ 1237 0.003200
329
+ 1238 0.003000
330
+ 1239 0.003000
331
+ 1240 0.002900
332
+ 1241 0.002800
333
+ 1242 0.002700
334
+ 1243 0.002600
335
+ 1244 0.002500
336
+ 1245 0.002400
337
+ 1246 0.002300
338
+ 1247 0.002200
339
+ 1248 0.002100
340
+ 1249 0.002000
341
+ 1250 0.001900
342
+ 1251 0.001800
343
+ 1252 0.001800
344
+ 1253 0.001700
345
+ 1254 0.001600
346
+ 1255 0.001600
347
+ 1256 0.001500
348
+ 1257 0.001400
349
+ 1258 0.001300
350
+ 1259 0.001300
351
+ 1260 0.001200
352
+ 1261 0.001200
353
+ 1262 0.001100
354
+ 1263 0.001100
355
+ 1264 0.001000
356
+ 1265 0.001000
357
+ 1266 0.000900
358
+ 1267 0.000900
359
+ 1268 0.000800
360
+ 1269 0.000800
361
+ 1270 0.000800
362
+ 1271 0.000800
363
+ 1272 0.000700
364
+ 1273 0.000700
365
+ 1274 0.000700
366
+ 1275 0.000600
367
+ 1276 0.000600
368
+ 1277 0.000600
369
+ 1278 0.000600
370
+ 1279 0.000500
371
+ 1280 0.000500
372
+ 1281 0.000500
373
+ 1282 0.000500
374
+ 1283 0.000500
375
+ 1284 0.000500
376
+ 1285 0.000500
377
+ 1286 0.000400
378
+ 1287 0.000400
379
+ 1288 0.000400
380
+ 1289 0.000400
381
+ 1290 0.000400
382
+ 1291 0.000400
383
+ 1292 0.000400
384
+ 1293 0.000400
385
+ 1294 0.000400
386
+ 1295 0.000400
387
+ 1296 0.000400
388
+ 1297 0.000300
389
+ 1298 0.000300
390
+ ```
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Severian/Nexus-IKM-Mistral-7B",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 14336,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "num_key_value_heads": 8,
18
+ "pad_token_id": 0,
19
+ "rms_norm_eps": 1e-05,
20
+ "rope_theta": 1000000.0,
21
+ "sliding_window": null,
22
+ "tie_word_embeddings": false,
23
+ "torch_dtype": "float16",
24
+ "transformers_version": "4.38.2",
25
+ "unsloth_version": "2024.3",
26
+ "use_cache": true,
27
+ "vocab_size": 32000
28
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.38.2"
6
+ }
model.safetensors.index.json ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 14483464192
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00003-of-00003.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00003.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00003.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
242
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
243
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
244
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
245
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
246
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
247
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
248
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
249
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
250
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
251
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
252
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
253
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
254
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
255
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
256
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
257
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
258
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
259
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
260
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
261
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
262
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
263
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
264
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
265
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
266
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
267
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
268
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
269
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
270
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
271
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
272
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
273
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
274
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
275
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
276
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
277
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
278
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
279
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
280
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
281
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
282
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
283
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
284
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
285
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
286
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
287
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
288
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
289
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
290
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
291
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
292
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
293
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
294
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
295
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
296
+ "model.norm.weight": "model-00003-of-00003.safetensors"
297
+ }
298
+ }
original_repo_url.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/Severian/Nexus-IKM-Mistral-7B
output.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a90200527ce1a1b87edbf65927539d8ec6877d1b040a10fe5996dbb3606ff2c
3
+ size 4728538500
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 14483464192
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00003-of-00003.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00003.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
16
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
17
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
18
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
19
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
20
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
21
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
22
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
23
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
24
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
25
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
26
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
27
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
28
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
29
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
30
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
31
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
32
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
33
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
34
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
35
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
36
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
37
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
38
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
39
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
40
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
41
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
42
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
43
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
44
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
45
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
46
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
47
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
48
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
49
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
50
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
51
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
52
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
53
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
54
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
55
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
56
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
57
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
58
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
59
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
60
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
61
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
62
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
63
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
64
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
65
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
66
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
67
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
68
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
69
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
70
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
71
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
72
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
73
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
74
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
75
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
76
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
77
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
78
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
79
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
80
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
81
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
82
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
83
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
84
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
85
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
86
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
87
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
88
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
89
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
90
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
91
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
92
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
93
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
94
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
95
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
96
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
97
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
98
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
99
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
100
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
101
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
102
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
103
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
104
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
105
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
106
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
107
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
108
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
109
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
110
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
111
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
112
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
113
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
114
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
115
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
116
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
117
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
118
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
119
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
120
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
121
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
122
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
123
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
124
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
125
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
126
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
127
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
128
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
129
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
130
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
131
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
132
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
133
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
134
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00002-of-00003.bin",
135
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00002-of-00003.bin",
136
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00002-of-00003.bin",
137
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00002-of-00003.bin",
138
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00002-of-00003.bin",
139
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
140
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
141
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
142
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
143
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
144
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
145
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
146
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
147
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
148
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00002-of-00003.bin",
149
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00002-of-00003.bin",
150
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00002-of-00003.bin",
151
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00002-of-00003.bin",
152
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
153
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
154
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
155
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
156
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
157
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
158
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
159
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
160
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
161
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
162
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
163
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
164
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
165
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
166
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
167
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
168
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
169
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
170
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
171
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
172
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
173
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
174
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
175
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
176
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
177
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
178
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
179
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
180
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
181
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
182
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
183
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
184
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
185
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
186
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
187
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
188
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
189
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
190
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
191
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
192
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
193
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
194
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
195
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
196
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
197
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
198
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
199
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
200
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
201
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
202
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
203
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
204
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
205
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
206
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
207
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
208
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
209
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
210
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
211
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
212
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
213
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
214
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
215
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
216
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
217
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
218
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
219
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
220
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
221
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
222
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
223
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
224
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
225
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
226
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
227
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
228
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
229
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
230
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
231
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
232
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
233
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00003-of-00003.bin",
234
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00003-of-00003.bin",
235
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00003-of-00003.bin",
236
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00003-of-00003.bin",
237
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00003-of-00003.bin",
238
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00003-of-00003.bin",
239
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00003-of-00003.bin",
240
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00003-of-00003.bin",
241
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00003-of-00003.bin",
242
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
243
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
244
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
245
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
246
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
247
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
248
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
249
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
250
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
251
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
252
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
253
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
254
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
255
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
256
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
257
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
258
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
259
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
260
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
261
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
262
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
263
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
264
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
265
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
266
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
267
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
268
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
269
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
270
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
271
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
272
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
273
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
274
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
275
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
276
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
277
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
278
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
279
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
280
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
281
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
282
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
283
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
284
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
285
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
286
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
287
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00003.bin",
288
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00003.bin",
289
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00003.bin",
290
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00003.bin",
291
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00003.bin",
292
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00003.bin",
293
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00003.bin",
294
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00003.bin",
295
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00003.bin",
296
+ "model.norm.weight": "pytorch_model-00003-of-00003.bin"
297
+ }
298
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [],
31
+ "bos_token": "<s>",
32
+ "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "legacy": true,
36
+ "model_max_length": 32768,
37
+ "pad_token": "<unk>",
38
+ "padding_side": "right",
39
+ "sp_model_kwargs": {},
40
+ "spaces_between_special_tokens": false,
41
+ "tokenizer_class": "LlamaTokenizer",
42
+ "unk_token": "<unk>",
43
+ "use_default_system_prompt": false
44
+ }