Mark-Arcee commited on
Commit
a6c74a9
1 Parent(s): 71bac2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -2
README.md CHANGED
@@ -18,7 +18,11 @@ This is a extension of a pre-trained language models created using [mergekit](ht
18
  # Merge Details
19
  ### Merge Method
20
 
21
- This model was extended using the passthrough merge method.
 
 
 
 
22
 
23
  ### Models Merged
24
 
@@ -181,7 +185,47 @@ The following YAML configuration was used to produce this model:
181
  merge_method: passthrough
182
  dtype: bfloat16
183
 
 
184
 
 
185
 
186
-
187
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  # Merge Details
19
  ### Merge Method
20
 
21
+ This method employs mergekit's passthrough method to expand blocks within the "CorticalStack/pastiche-crown-clown-7b-dare-dpo" model. For every 5th layer,
22
+ a new layer is added, with the `o_proj` and `down_proj` parameters of these added layers initialized to zero, mirroring the approach used in LLaMA Pro.
23
+
24
+ ### It's important to note that this configuration has not undergone fine-tuning. Therefore, when fine-tuning, ensure that only every 5th layer is trainable, while all other layers remain frozen.
25
+
26
 
27
  ### Models Merged
28
 
 
185
  merge_method: passthrough
186
  dtype: bfloat16
187
 
188
+ ```
189
 
190
+ # Function to freeze layers
191
 
 
192
  ```
193
+ from transformers import AutoModelForCausalLM
194
+
195
+ def enable_grad_only_every_nth(model, n):
196
+ """
197
+ This function configures the specified model to enable gradient calculations exclusively for every nth layer, starting
198
+ from the first layer (0-indexed), to accommodate newly added blocks for training. Concurrently, it freezes the gradients
199
+ for all other components of the model, including the embedding layers and the model's head. This setup is particularly
200
+ useful for fine-tuning processes where only a subset of layers are targeted for updates, ensuring efficient training and
201
+ adaptation of newly integrated layers while maintaining the pre-trained behavior of other model components.
202
+ """
203
+
204
+ # Freeze embeddings.
205
+ for param in model.model.embed_tokens.parameters():
206
+ param.requires_grad = False
207
+
208
+ # Freeze lm_head.
209
+ for param in model.lm_head.parameters():
210
+ param.requires_grad = False
211
+
212
+ # Enable gradients for every nth layer
213
+ layers = model.model.layers # Access the ModuleList containing the layers
214
+
215
+ for index, layer in enumerate(layers):
216
+
217
+ if (index + 1) % n == 0: # Enables gradients for every nth layer, starting from the layer after the 0th
218
+ for param in layer.parameters():
219
+ param.requires_grad = True
220
+ else:
221
+ for param in layer.parameters():
222
+ param.requires_grad = False
223
+
224
+ model = transformers.AutoModelForCausalLM.from_pretrained(
225
+ "arcee-ai/Mistral-7B-Instruct-v0.2-expanded"
226
+ )
227
+ # Update layer gradients, specify the correct value for n based on your model's architecture
228
+ n =5
229
+ enable_grad_only_every_nth(model, n)
230
+ ```
231
+