bartowski commited on
Commit
78eea9a
1 Parent(s): 996c8da

Quant for 3.5

Browse files
README.md CHANGED
@@ -3,63 +3,72 @@ library_name: transformers
3
  license: apache-2.0
4
  datasets:
5
  - Locutusque/Hercules-v3.0
6
- quantized_by: bartowski
7
- pipeline_tag: text-generation
8
  ---
9
-
10
- ## Exllama v2 Quantizations of Hercules-3.0-Mistral-7B
11
-
12
- Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.13">turboderp's ExLlamaV2 v0.0.13</a> for quantization.
13
-
14
- <b>The "main" branch only contains the measurement.json, download one of the other branches for the model (see below)</b>
15
-
16
- Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.
17
-
18
- Original model: https://huggingface.co/Locutusque/Hercules-3.0-Mistral-7B
19
-
20
- | Branch | Bits | lm_head bits | VRAM (4k) | VRAM (16k) | VRAM (32k) | Description |
21
- | ----- | ---- | ------- | ------ | ------ | ------ | ------------ |
22
- | [8_0](https://huggingface.co/bartowski/Hercules-3.0-Mistral-7B-exl2/tree/8_0) | 8.0 | 8.0 | 8.4 GB | 9.8 GB | 11.8 GB | Maximum quality that ExLlamaV2 can produce, near unquantized performance. |
23
- | [6_5](https://huggingface.co/bartowski/Hercules-3.0-Mistral-7B-exl2/tree/6_5) | 6.5 | 8.0 | 7.2 GB | 8.6 GB | 10.6 GB | Very similar to 8.0, good tradeoff of size vs performance, **recommended**. |
24
- | [5_0](https://huggingface.co/bartowski/Hercules-3.0-Mistral-7B-exl2/tree/5_0) | 5.0 | 6.0 | 6.0 GB | 7.4 GB | 9.4 GB | Slightly lower quality vs 6.5, but usable on 8GB cards. |
25
- | [4_25](https://huggingface.co/bartowski/Hercules-3.0-Mistral-7B-exl2/tree/4_25) | 4.25 | 6.0 | 5.3 GB | 6.7 GB | 8.7 GB | GPTQ equivalent bits per weight, slightly higher quality. |
26
- | [3_5](https://huggingface.co/bartowski/Hercules-3.0-Mistral-7B-exl2/tree/3_5) | 3.5 | 6.0 | 4.7 GB | 6.1 GB | 8.1 GB | Lower quality, only use if you have to. |
27
-
28
- ## Download instructions
29
-
30
- With git:
31
-
32
- ```shell
33
- git clone --single-branch --branch 6_5 https://huggingface.co/bartowski/Hercules-3.0-Mistral-7B-exl2 Hercules-3.0-Mistral-7B-exl2-6_5
34
- ```
35
-
36
- With huggingface hub (credit to TheBloke for instructions):
37
-
38
- ```shell
39
- pip3 install huggingface-hub
40
- ```
41
-
42
- To download the `main` (only useful if you only care about measurement.json) branch to a folder called `Hercules-3.0-Mistral-7B-exl2`:
43
-
44
- ```shell
45
- mkdir Hercules-3.0-Mistral-7B-exl2
46
- huggingface-cli download bartowski/Hercules-3.0-Mistral-7B-exl2 --local-dir Hercules-3.0-Mistral-7B-exl2 --local-dir-use-symlinks False
47
- ```
48
-
49
- To download from a different branch, add the `--revision` parameter:
50
-
51
- Linux:
52
-
53
- ```shell
54
- mkdir Hercules-3.0-Mistral-7B-exl2-6_5
55
- huggingface-cli download bartowski/Hercules-3.0-Mistral-7B-exl2 --revision 6_5 --local-dir Hercules-3.0-Mistral-7B-exl2-6_5 --local-dir-use-symlinks False
56
- ```
57
-
58
- Windows (which apparently doesn't like _ in folders sometimes?):
59
-
60
- ```shell
61
- mkdir Hercules-3.0-Mistral-7B-exl2-6.5
62
- huggingface-cli download bartowski/Hercules-3.0-Mistral-7B-exl2 --revision 6_5 --local-dir Hercules-3.0-Mistral-7B-exl2-6.5 --local-dir-use-symlinks False
63
- ```
64
-
65
- Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski
 
 
 
 
 
 
 
 
 
 
 
 
3
  license: apache-2.0
4
  datasets:
5
  - Locutusque/Hercules-v3.0
 
 
6
  ---
7
+ # Model Card: Hercules-3.0-Mistral-7B
8
+
9
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6437292ecd93f4c9a34b0d47/Ip9wEG2Ne4vihNStHSDvX.png)
10
+
11
+ ## Model Description
12
+
13
+ Hercules-3.0-Mistral-7B is a fine-tuned language model derived from Mistralai/Mistral-7B-v0.1. It is specifically designed to excel in instruction following, function calls, and conversational interactions across various scientific and technical domains. The dataset used for fine-tuning, also named Hercules-v3.0, expands upon the diverse capabilities of OpenHermes-2.5 with contributions from numerous curated datasets. This fine-tuning has hercules-v3.0 with enhanced abilities in:
14
+
15
+ - Complex Instruction Following: Understanding and accurately executing multi-step instructions, even those involving specialized terminology.
16
+ - Function Calling: Seamlessly interpreting and executing function calls, providing appropriate input and output values.
17
+ - Domain-Specific Knowledge: Engaging in informative and educational conversations about Biology, Chemistry, Physics, Mathematics, Medicine, Computer Science, and more.
18
+
19
+ ## Intended Uses & Potential Bias
20
+
21
+ Hercules-3.0-Mistral-7B is well-suited to the following applications:
22
+
23
+ - Specialized Chatbots: Creating knowledgeable chatbots and conversational agents in scientific and technical fields.
24
+ - Instructional Assistants: Supporting users with educational and step-by-step guidance in various disciplines.
25
+ - Code Generation and Execution: Facilitating code execution through function calls, aiding in software development and prototyping.
26
+
27
+ **Important Note: Although Hercules-v3.0 is carefully constructed, it's important to be aware that the underlying data sources may contain biases or reflect harmful stereotypes. Use this model with caution and consider additional measures to mitigate potential biases in its responses.**
28
+
29
+ ## Limitations and Risks
30
+
31
+ - Toxicity: The dataset contains toxic or harmful examples.
32
+ - Hallucinations and Factual Errors: Like other language models, Hercules-3.0-Mistral-7B may generate incorrect or misleading information, especially in specialized domains where it lacks sufficient expertise.
33
+ - Potential for Misuse: The ability to engage in technical conversations and execute function calls could be misused for malicious purposes.
34
+
35
+ ## Training Data
36
+
37
+ Hercules-3.0-Mistral-7B is fine-tuned from the following sources:
38
+
39
+ - `cognitivecomputations/dolphin`
40
+ - `Evol Instruct 70K & 140K`
41
+ - `teknium/GPT4-LLM-Cleaned`
42
+ - `jondurbin/airoboros-3.2`
43
+ - `AlekseyKorshuk/camel-chatml`
44
+ - `CollectiveCognition/chats-data-2023-09-22`
45
+ - `Nebulous/lmsys-chat-1m-smortmodelsonly`
46
+ - `glaiveai/glaive-code-assistant-v2`
47
+ - `glaiveai/glaive-code-assistant`
48
+ - `glaiveai/glaive-function-calling-v2`
49
+ - `garage-bAInd/Open-Platypus`
50
+ - `meta-math/MetaMathQA`
51
+ - `teknium/GPTeacher-General-Instruct`
52
+ - `GPTeacher roleplay datasets`
53
+ - `BI55/MedText`
54
+ - `pubmed_qa labeled subset`
55
+ - `Unnatural Instructions`
56
+ - `M4-ai/LDJnr_combined_inout_format`
57
+ - `CollectiveCognition/chats-data-2023-09-27`
58
+ - `CollectiveCognition/chats-data-2023-10-16`
59
+ - `NobodyExistsOnTheInternet/sharegptPIPPA`
60
+ - `yuekai/openchat_sharegpt_v3_vicuna_format`
61
+ - `ise-uiuc/Magicoder-Evol-Instruct-110K`
62
+ - `Squish42/bluemoon-fandom-1-1-rp-cleaned`
63
+ - `sablo/oasst2_curated`
64
+
65
+ ## Training Procedure
66
+
67
+ - This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!)
68
+ - A learning rate of 2e-06 with the Adam optimizer. A linear scheduler was used, with an end factor of 0.3. A low learning rate was used to prevent exploding gradients.
69
+ - No mixed precision was used, with the default dtype being bfloat16.
70
+ - Trained on 1,400,000 examples of Hercules-v3.0
71
+ - No model parameters were frozen.
72
+ - This model was trained on OpenAI's ChatML prompt format. Because this model has function calling capabilities, the prompt format is slightly different, here's what it would look like: ```<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{user message}<|im_end|>\n<|im_start|>call\n{function call message}<|im_end|>\n<|im_start|>function\n{function response message}<|im_end|>\n<|im_start|>assistant\n{assistant message}</s>```
73
+
74
+ This model was fine-tuned using the TPU-Alignment repository. https://github.com/Locutusque/TPU-Alignment
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Locutusque/Hercules-3.0-Mistral-7B",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 14336,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "num_key_value_heads": 8,
18
+ "rms_norm_eps": 1e-05,
19
+ "rope_theta": 10000.0,
20
+ "sliding_window": 4096,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "bfloat16",
23
+ "transformers_version": "4.37.2",
24
+ "use_cache": true,
25
+ "vocab_size": 32000
26
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.37.2"
6
+ }
model.safetensors.index.json ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 14483464192
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00008-of-00008.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00008.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00008.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00008.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00003-of-00008.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00003-of-00008.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00004-of-00008.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00004-of-00008.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00004-of-00008.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00004-of-00008.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00004-of-00008.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00004-of-00008.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00004-of-00008.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00008.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00005-of-00008.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00008.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00008.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00008.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00008.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00005-of-00008.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00005-of-00008.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00008.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00008.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00008.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00005-of-00008.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00005-of-00008.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00008.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00006-of-00008.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00005-of-00008.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00008.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00008.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00008.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00008.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00006-of-00008.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00006-of-00008.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00006-of-00008.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00006-of-00008.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00006-of-00008.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00006-of-00008.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00008.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00007-of-00008.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00008.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00008.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00008.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00008.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00007-of-00008.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00007-of-00008.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00007-of-00008.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00007-of-00008.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00007-of-00008.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00002-of-00008.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00008.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00008.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00008.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00008.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00008.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00008.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00008-of-00008.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00008-of-00008.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00007-of-00008.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00008-of-00008.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00007-of-00008.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00007-of-00008.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00007-of-00008.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00007-of-00008.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00008-of-00008.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00008-of-00008.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00008-of-00008.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00008-of-00008.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00008-of-00008.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00008-of-00008.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00008-of-00008.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00008-of-00008.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00008-of-00008.safetensors",
242
+ "model.layers.4.input_layernorm.weight": "model-00002-of-00008.safetensors",
243
+ "model.layers.4.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
244
+ "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
245
+ "model.layers.4.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
246
+ "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
247
+ "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
248
+ "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
249
+ "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
250
+ "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
251
+ "model.layers.5.input_layernorm.weight": "model-00002-of-00008.safetensors",
252
+ "model.layers.5.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
253
+ "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
254
+ "model.layers.5.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
255
+ "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
256
+ "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
257
+ "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
258
+ "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
259
+ "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
260
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00008.safetensors",
261
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
262
+ "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
263
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
264
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
265
+ "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
266
+ "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
267
+ "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
268
+ "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
269
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00008.safetensors",
270
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00008.safetensors",
271
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00008.safetensors",
272
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00008.safetensors",
273
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00008.safetensors",
274
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
275
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
276
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
277
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
278
+ "model.layers.8.input_layernorm.weight": "model-00003-of-00008.safetensors",
279
+ "model.layers.8.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
280
+ "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
281
+ "model.layers.8.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
282
+ "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
283
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00008.safetensors",
284
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00008.safetensors",
285
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00008.safetensors",
286
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00008.safetensors",
287
+ "model.layers.9.input_layernorm.weight": "model-00003-of-00008.safetensors",
288
+ "model.layers.9.mlp.down_proj.weight": "model-00003-of-00008.safetensors",
289
+ "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00008.safetensors",
290
+ "model.layers.9.mlp.up_proj.weight": "model-00003-of-00008.safetensors",
291
+ "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00008.safetensors",
292
+ "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00008.safetensors",
293
+ "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00008.safetensors",
294
+ "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00008.safetensors",
295
+ "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00008.safetensors",
296
+ "model.norm.weight": "model-00008-of-00008.safetensors"
297
+ }
298
+ }
original_repo_url.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/Locutusque/Hercules-3.0-Mistral-7B
output.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6580c7fe1141ea85116d04c56255f350651b7de922b4afa92612afb6a8c287e
3
+ size 3419540736
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "added_tokens_decoder": {
5
+ "0": {
6
+ "content": "<unk>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "1": {
14
+ "content": "<s>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "2": {
22
+ "content": "</s>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ }
29
+ },
30
+ "additional_special_tokens": [],
31
+ "bos_token": "<s>",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "legacy": true,
35
+ "max_length": 512,
36
+ "model_max_length": 1000000000000000019884624838656,
37
+ "pad_to_multiple_of": null,
38
+ "pad_token": "</s>",
39
+ "pad_token_type_id": 0,
40
+ "padding_side": "left",
41
+ "sp_model_kwargs": {},
42
+ "spaces_between_special_tokens": false,
43
+ "stride": 0,
44
+ "tokenizer_class": "LlamaTokenizer",
45
+ "truncation_side": "right",
46
+ "truncation_strategy": "longest_first",
47
+ "unk_token": "<unk>",
48
+ "use_default_system_prompt": false
49
+ }