cgus commited on
Commit
09c3ca9
1 Parent(s): cdf68fc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +196 -0
README.md ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - WhiteRabbitNeo/WRN-Chapter-1
5
+ - WhiteRabbitNeo/WRN-Chapter-2
6
+ - LDJnr/Capybara
7
+ - teknium/openhermes
8
+ - teknium/GPTeacher-General-Instruct
9
+ - Weyaxi/sci-datasets
10
+ - TIGER-Lab/MathInstruct
11
+ - hiyouga/glaive-function-calling-v2-sharegpt
12
+ - glaiveai/glaive-code-assistant
13
+ - m-a-p/CodeFeedback-Filtered-Instruction
14
+ - m-a-p/Code-Feedback
15
+ - migtissera/Synthia-v1.3
16
+ - abacusai/SystemChat
17
+ - jondurbin/airoboros-3.2
18
+ - vicgalle/alpaca-gpt4
19
+ - garage-bAInd/Open-Platypus
20
+ - trollek/Mouse-Diffusion-Instruct
21
+ - trollek/Self-Rewarding-Mouse
22
+ language:
23
+ - en
24
+ base_model: trollek/NinjaMouse2-2.5B-v0.1-GGUF
25
+ ---
26
+
27
+ # NinjaMouse2-2.5B-v0.1-iMat-GGUF
28
+ Original model: [NinjaMouse2-2.5B-v0.1](https://huggingface.co/trollek/NinjaMouse2-2.5B-v0.1)
29
+ Model creator: [trollek](https://huggingface.co/trollek)
30
+
31
+ ## Quantization notes
32
+ This repo contains GGUF models made with llama.cpp b2700 with iMatrix calibration.
33
+ I used calibration data from the default dataset of Exllamav2 project.
34
+ All quants were made with the imatrix file.
35
+
36
+ # Original model card
37
+ ![](https://huggingface.co/trollek/NinjaMouse2-2.5B-v0.1/resolve/main/ninjamouse2.jpeg)
38
+
39
+ Same procedure as last time?
40
+
41
+ **Sort of.**
42
+
43
+ This model is a block expanded danube2, using the Llama Pro method of only training (or fine tuning) the expanded blocks. To do this on limited hardware I had to expand by 2 layers per step, from the original 24 to 32. At least, that was the original plan. With the 32 layer model I used BAdam to do a "once over" with most the datasets I also used to expand the model. While it is a faux full fine tune, it isn't really that different from the Llama Pro method, e.g. layerwise insertion of data.
44
+
45
+ I have a feeling that Llama3 and other well trained models feels better because of markdown (formatting), personality (friendliness), and prompt compliance (prefereneceness.. I guess). Thus I have used Llama3 8B, [WizardLM2](https://huggingface.co/bartowski/WizardLM-2-7B-GGUF), and [Hermes 2 Pro Mistral](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B) to generate training data for this model.
46
+
47
+ To ensure that the full 8k context window could be utilised this time I filtered openhermes, Synthia, LongAlpaca, and MathInstruct for entries with a token count between 2k and 8k, to DoRA, QLoRA, and BAdam the context window into submission. One time, elsewhere, even with `lm_head` as an additional target, and twice with `embed_tokens`.
48
+
49
+ The astute among you may notice the extra special tokens like the fim and thought tokens. NinjaMouse has not been trained to use those.. Yet! Also: This is actually 34 layers. Surprise!
50
+
51
+ Here's the thing with the 2 extra layers compared to my first model. When I trained NinjaMouse2 with 32 layers I noticed that the `grad_norm` value would behave strangely on layer 3 and 27. The last layer, before the expansion used to be 27, while 3 is a mystery. I decided to use [mergekit](https://github.com/arcee-ai/mergekit) to copy layer 3 and insert it beside the original, and copy layer 27 and insert it at the end or top (the new 33, all 0 indexed), depending on your perspective.
52
+ ### The procedure
53
+
54
+ #### 24 -> 26
55
+
56
+ - LDJnr/Capybara
57
+ - m-a-p/Code-Feedback
58
+ - m-a-p/CodeFeedback-Filtered-Instruction
59
+ - WRN non enhanced
60
+ - abacusai/SystemChat
61
+
62
+ #### 26 -> 28
63
+
64
+ - toolcall 10k
65
+ - migtissera/Synthia-v1.3
66
+ - TIGER-Lab/MathInstruct
67
+
68
+ #### 28 -> 30
69
+
70
+ - glaiveai/glaive-code-assistant
71
+ - hiyouga/glaive-function-calling-v2-sharegpt
72
+ - Weyaxi/sci-datasets (w/o code feedback instruct, mathinstruct, camel)
73
+
74
+ #### 30 -> 32
75
+
76
+ - jondurbin/airoboros-3.2
77
+ - teknium/openhermes
78
+ - WRN enhanced
79
+ - garage-bAInd/Open-Platypus
80
+ - vicgalle/alpaca-gpt4
81
+ ### Post tuning
82
+
83
+ *Self-reward with a teacher* is what this approach can be confidently called. I wish there were a distilled version of that name, but I am coming up blank.
84
+
85
+ I have any model generate a bunch of prompts that a teacher model answers with gusto (the chosen column), and then have NinjaMouse2 also answer them (as the rejects). **BAM**. Skibidibi doo. Have I made these DPO datasets? No. But the prompts, their evaluations, along with responses of its own, responses from better models, and evaluations of both of them are included in the training. You can find the dataset [here](https://huggingface.co/datasets/trollek/Self-Rewarding-Mouse).
86
+
87
+ ## Notes
88
+
89
+ ### License
90
+
91
+ To use this model you agree to use it like Spider-man:
92
+ Apache 2.0 + White Rabbit Neo *(below)*
93
+
94
+ ```
95
+ You agree not to use the Model or Derivatives of the Model:
96
+
97
+ - In any way that violates any applicable national or international law or regulation or infringes upon the lawful rights and interests of any third party;
98
+ - For military use in any way;
99
+ - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
100
+ - To generate or disseminate verifiably false information and/or content with the purpose of harming others;
101
+ - To generate or disseminate inappropriate content subject to applicable regulatory requirements;
102
+ - To generate or disseminate personal identifiable information without due authorization or for unreasonable use;
103
+ - To defame, disparage or otherwise harass others;
104
+ - For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
105
+ - For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
106
+ - To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
107
+ - For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories.
108
+ ```
109
+
110
+ ### Template
111
+ I made this ([OpenChatML](https://github.com/cognitivecomputations/OpenChatML) like) template for LLama Factory and added it to the bottom of `LLama-Factory/src/llmtuner/data/template.py`
112
+
113
+ ```python
114
+ _register_template(
115
+ name="ninja_chatml",
116
+ format_user=StringFormatter(slots=["<|im_start|>user\n{{content}}\n<|im_end|>\n"]), # Works
117
+ format_assistant=StringFormatter(slots=["<|im_start|>assistant\n{{content}}\n<|im_end|>", {"eos_token"}]), # Works
118
+ format_system=StringFormatter(slots=["<|im_start|>system\n{{content}}\n<|im_end|>\n"]), # NinjaMouse does not like BOS!
119
+ format_function=FunctionFormatter(slots=["<|im_start|>assistant\n<tool_call>\n{\"name\":\"{{name}}\", \"arguments\":{{arguments}}}\n</tool_call>\n<|im_end|>", {"eos_token"}]), # Works
120
+ format_observation=StringFormatter(slots=["<|im_start|>tool\n<tool_response>\n{{content}}\n</tool_response>\n<|im_end|>\n"]), # Works
121
+ format_separator=EmptyFormatter(slots=["\n"]), # It makes sense to keep this a new line instead of </s> and apply the eos token directly
122
+ format_tools=ToolFormatter(tool_format="open_chatml"),
123
+ )
124
+ ```
125
+
126
+ To format the tools I have added the following code to `formatter.py` in the same folder.
127
+
128
+ ```python
129
+ # At the top
130
+ HERMES_TOOL_PROMPT = (
131
+ "\n<tools>\n"
132
+ "{funtion_description}\n"
133
+ "</tools>\n"
134
+ )
135
+
136
+ # I only added the elif
137
+ @dataclass
138
+ class ToolFormatter(Formatter):
139
+ def __post_init__(self):
140
+ if self.tool_format is None:
141
+ raise ValueError("Tool format was not found.")
142
+
143
+
144
+ def apply(self, **kwargs) -> SLOTS:
145
+ content = kwargs.pop("content")
146
+ try:
147
+ tools = json.loads(content)
148
+ if not len(tools):
149
+ return [""]
150
+
151
+ if self.tool_format == "default":
152
+ return [default_tool_formatter(tools)]
153
+ elif self.tool_format == "open_chatml": # This right here
154
+ return [OPEN_CHATML_TOOL_PROMPT.format(funtion_description=json.dumps(tools, ensure_ascii=False, indent=4))] # I used 4 but OpenChatML has 2
155
+ else:
156
+ raise NotImplementedError
157
+ except Exception:
158
+ return [""]
159
+
160
+
161
+ def extract(self, content: str) -> Union[str, Tuple[str, str]]:
162
+ if self.tool_format == "default":
163
+ return default_tool_extractor(content)
164
+ else:
165
+ raise NotImplementedError
166
+ ```
167
+
168
+ ### Model specs
169
+
170
+ ```
171
+ MistralForCausalLM(
172
+ (model): MistralModel(
173
+ (embed_tokens): Embedding(32009, 2560, padding_idx=0)
174
+ (layers): ModuleList(
175
+ (0-33): 34 x MistralDecoderLayer(
176
+ (self_attn): MistralSdpaAttention(
177
+ (q_proj): Linear(in_features=2560, out_features=2560, bias=False)
178
+ (k_proj): Linear(in_features=2560, out_features=640, bias=False)
179
+ (v_proj): Linear(in_features=2560, out_features=640, bias=False)
180
+ (o_proj): Linear(in_features=2560, out_features=2560, bias=False)
181
+ (rotary_emb): MistralRotaryEmbedding()
182
+ )
183
+ (mlp): MistralMLP( (gate_proj): Linear(in_features=2560, out_features=6912, bias=False)
184
+ (up_proj): Linear(in_features=2560, out_features=6912, bias=False)
185
+ (down_proj): Linear(in_features=6912, out_features=2560, bias=False)
186
+ (act_fn): SiLU()
187
+ )
188
+ (input_layernorm): MistralRMSNorm()
189
+ (post_attention_layernorm): MistralRMSNorm()
190
+ )
191
+ )
192
+ (norm): MistralRMSNorm()
193
+ )
194
+ (lm_head): Linear(in_features=2560, out_features=32009, bias=False)
195
+ )
196
+ ```