Gemma-IT-Expanded-Unfrozen-Layers
This method employs mergekit's passthrough method to expand blocks within the "google/gemma-2b-it" model. For every fourth layer,
a new layer is added, with the o_proj
and down_proj
parameters of these added layers initialized to zero, mirroring the approach used in LLaMA Pro.
It's important to note that this configuration has not undergone fine-tuning. Therefore, when fine-tuning, ensure that only every fourth layer is adjusted,
while all other layers remain frozen.
🧩 Configuration
slices:
- sources:
- model: google/gemma-2b-it
layer_range: [0, 3]
- sources:
- model: google/gemma-2b-it
layer_range: [2, 3]
parameters:
scale:
- filter: o_proj
value: 0.0
- filter: down_proj
value: 0.0
- value: 1.0
- sources:
- model: google/gemma-2b-it
layer_range: [3, 6]
- sources:
- model: google/gemma-2b-it
layer_range: [5, 6]
parameters:
scale:
- filter: o_proj
value: 0.0
- filter: down_proj
value: 0.0
- value: 1.0
- sources:
- model: google/gemma-2b-it
layer_range: [6, 9]
- sources:
- model: google/gemma-2b-it
layer_range: [8, 9]
parameters:
scale:
- filter: o_proj
value: 0.0
- filter: down_proj
value: 0.0
- value: 1.0
- sources:
- model: google/gemma-2b-it
layer_range: [9, 12]
- sources:
- model: google/gemma-2b-it
layer_range: [11, 12]
parameters:
scale:
- filter: o_proj
value: 0.0
- filter: down_proj
value: 0.0
- value: 1.0
- sources:
- model: google/gemma-2b-it
layer_range: [12, 15]
- sources:
- model: google/gemma-2b-it
layer_range: [14, 15]
parameters:
scale:
- filter: o_proj
value: 0.0
- filter: down_proj
value: 0.0
- value: 1.0
- sources:
- model: google/gemma-2b-it
layer_range: [15, 18]
- sources:
- model: google/gemma-2b-it
layer_range: [17, 18]
parameters:
scale:
- filter: o_proj
value: 0.0
- filter: down_proj
value: 0.0
- value: 1.0
merge_method: passthrough
dtype: bfloat16
# Function to freeze layers
from transformers import AutoModelForCausalLM
def update_layer_gradients(model, n):
"""
Enables gradients only for every nth layer within the model's layers, starting from the layer after the 0th.
:param model: The model instance, assumed to be of type GemmaForCausalLM or similar.
:param n: Interval at which layers after the first will have their gradients enabled, indicating they are newly added.
"""
layers = model.model.layers # Access the ModuleList containing the layers
for i, layer in enumerate(layers):
if i % n == (n - 1): # Enables gradients for every nth layer, starting from the layer after the 0th
print(i)
for param in layer.parameters():
param.requires_grad = True
else:
for param in layer.parameters():
param.requires_grad = False
# Load the model
model = AutoModelForCausalLM.from_pretrained("/Users/gayalshamane/Documents/mergekit/gemma-2b-it-expanded")
# Update layer gradients, specify the correct value for n based on your model's architecture
n = 4 # Example: update every 4rd layer, starting from the first layer after the 0th, adjust this value as needed
update_layer_gradients(model, n)
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support