autotrain-advanced fine tuning - Please specify `target_modules` in `peft_config`

#19
by meowman - opened

I am trying to fine-tune phi-3 with autotrain-advanced.

In Phi-2 it wanted these:

target_modules=[
'q_proj',
'k_proj',
'v_proj',
'dense',
'fc1',
'fc2',
]

They don't work, so I tried using none. I got this error.

Here is the full error that asked me to include target_modules.

Loading checkpoint shards: 100% 2/2 [00:10<00:00,  5.32s/it]
INFO     | 2024-04-23 21:32:20 | __main__:train:352 - model dtype: torch.float16
INFO     | 2024-04-23 21:32:20 | __main__:train:360 - preparing peft model...
ERROR    | 2024-04-23 21:32:20 | autotrain.trainers.common:wrapper:119 - train has failed due to an exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/common.py", line 116, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/clm/__main__.py", line 395, in train
    model = get_peft_model(model, peft_config)
  File "/usr/local/lib/python3.10/dist-packages/peft/mapping.py", line 136, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1094, in __init__
    super().__init__(model, peft_config, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 129, in __init__
    self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py", line 136, in __init__
    super().__init__(model, config, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 148, in __init__
    self.inject_adapter(self.model, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 293, in inject_adapter
    peft_config = self._prepare_adapter_config(peft_config, model_config)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py", line 412, in _prepare_adapter_config
    raise ValueError("Please specify `target_modules` in `peft_config`")
ValueError: Please specify `target_modules` in `peft_config`

ERROR    | 2024-04-23 21:32:20 | autotrain.trainers.common:wrapper:120 - Please specify `target_modules` in `peft_config`

I did and added the line --target-modules q_proj,k_proj,v_proj,dense,fc1,fc2 \

Now we get this error.

INFO     | 2024-04-23 23:52:11 | __main__:train:360 - preparing peft model...
ERROR    | 2024-04-23 23:52:11 | autotrain.trainers.common:wrapper:119 - train has failed due to an exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/common.py", line 116, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/clm/__main__.py", line 395, in train
    model = get_peft_model(model, peft_config)
  File "/usr/local/lib/python3.10/dist-packages/peft/mapping.py", line 136, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1094, in __init__
    super().__init__(model, peft_config, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 129, in __init__
    self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py", line 136, in __init__
    super().__init__(model, config, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 148, in __init__
    self.inject_adapter(self.model, adapter_name)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 328, in inject_adapter
    raise ValueError(
ValueError: Target modules {'dense', 'q_proj', 'v_proj', 'fc2', 'fc1', 'k_proj'} not found in the base model. Please check the target modules and try again.

ERROR    | 2024-04-23 23:52:11 | autotrain.trainers.common:wrapper:120 - Target modules {'dense', 'q_proj', 'v_proj', 'fc2', 'fc1', 'k_proj'} not found in the base model. Please check the target modules and try again.

So I ran the following code to find what modules it needs

# If you need to find out the target_modules for peft used this code
from transformers import AutoModelForCausalLM

# Load the model
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

# Print out the named modules
for name, module in model.named_modules():
    print(name)

And I get this output.

model
model.embed_tokens
model.embed_dropout
model.layers
model.layers.0
model.layers.0.self_attn
model.layers.0.self_attn.o_proj
model.layers.0.self_attn.qkv_proj
model.layers.0.self_attn.rotary_emb
model.layers.0.mlp
model.layers.0.mlp.gate_up_proj
model.layers.0.mlp.down_proj
model.layers.0.mlp.activation_fn
model.layers.0.input_layernorm
model.layers.0.resid_attn_dropout
model.layers.0.resid_mlp_dropout
model.layers.0.post_attention_layernorm
model.layers.1
model.layers.1.self_attn
model.layers.1.self_attn.o_proj
model.layers.1.self_attn.qkv_proj
model.layers.1.self_attn.rotary_emb
model.layers.1.mlp
model.layers.1.mlp.gate_up_proj
model.layers.1.mlp.down_proj
model.layers.1.mlp.activation_fn
model.layers.1.input_layernorm
model.layers.1.resid_attn_dropout
model.layers.1.resid_mlp_dropout
model.layers.1.post_attention_layernorm
model.layers.2
model.layers.2.self_attn
model.layers.2.self_attn.o_proj
model.layers.2.self_attn.qkv_proj
model.layers.2.self_attn.rotary_emb
model.layers.2.mlp
model.layers.2.mlp.gate_up_proj
model.layers.2.mlp.down_proj
model.layers.2.mlp.activation_fn
model.layers.2.input_layernorm
model.layers.2.resid_attn_dropout
model.layers.2.resid_mlp_dropout
model.layers.2.post_attention_layernorm
model.layers.3
model.layers.3.self_attn
model.layers.3.self_attn.o_proj
model.layers.3.self_attn.qkv_proj
model.layers.3.self_attn.rotary_emb
model.layers.3.mlp
model.layers.3.mlp.gate_up_proj
model.layers.3.mlp.down_proj
model.layers.3.mlp.activation_fn
model.layers.3.input_layernorm
model.layers.3.resid_attn_dropout
model.layers.3.resid_mlp_dropout
model.layers.3.post_attention_layernorm
model.layers.4
model.layers.4.self_attn
model.layers.4.self_attn.o_proj
model.layers.4.self_attn.qkv_proj
model.layers.4.self_attn.rotary_emb
model.layers.4.mlp
model.layers.4.mlp.gate_up_proj
model.layers.4.mlp.down_proj
model.layers.4.mlp.activation_fn
model.layers.4.input_layernorm
model.layers.4.resid_attn_dropout
model.layers.4.resid_mlp_dropout
model.layers.4.post_attention_layernorm
model.layers.5
model.layers.5.self_attn
model.layers.5.self_attn.o_proj
model.layers.5.self_attn.qkv_proj
model.layers.5.self_attn.rotary_emb
model.layers.5.mlp
model.layers.5.mlp.gate_up_proj
model.layers.5.mlp.down_proj
model.layers.5.mlp.activation_fn
model.layers.5.input_layernorm
model.layers.5.resid_attn_dropout
model.layers.5.resid_mlp_dropout
model.layers.5.post_attention_layernorm
model.layers.6
model.layers.6.self_attn
model.layers.6.self_attn.o_proj
model.layers.6.self_attn.qkv_proj
model.layers.6.self_attn.rotary_emb
model.layers.6.mlp
model.layers.6.mlp.gate_up_proj
model.layers.6.mlp.down_proj
model.layers.6.mlp.activation_fn
model.layers.6.input_layernorm
model.layers.6.resid_attn_dropout
model.layers.6.resid_mlp_dropout
model.layers.6.post_attention_layernorm
model.layers.7
model.layers.7.self_attn
model.layers.7.self_attn.o_proj
model.layers.7.self_attn.qkv_proj
model.layers.7.self_attn.rotary_emb
model.layers.7.mlp
model.layers.7.mlp.gate_up_proj
model.layers.7.mlp.down_proj
model.layers.7.mlp.activation_fn
model.layers.7.input_layernorm
model.layers.7.resid_attn_dropout
model.layers.7.resid_mlp_dropout
model.layers.7.post_attention_layernorm
model.layers.8
model.layers.8.self_attn
model.layers.8.self_attn.o_proj
model.layers.8.self_attn.qkv_proj
model.layers.8.self_attn.rotary_emb
model.layers.8.mlp
model.layers.8.mlp.gate_up_proj
model.layers.8.mlp.down_proj
model.layers.8.mlp.activation_fn
model.layers.8.input_layernorm
model.layers.8.resid_attn_dropout
model.layers.8.resid_mlp_dropout
model.layers.8.post_attention_layernorm
model.layers.9
model.layers.9.self_attn
model.layers.9.self_attn.o_proj
model.layers.9.self_attn.qkv_proj
model.layers.9.self_attn.rotary_emb
model.layers.9.mlp
model.layers.9.mlp.gate_up_proj
model.layers.9.mlp.down_proj
model.layers.9.mlp.activation_fn
model.layers.9.input_layernorm
model.layers.9.resid_attn_dropout
model.layers.9.resid_mlp_dropout
model.layers.9.post_attention_layernorm
model.layers.10
model.layers.10.self_attn
model.layers.10.self_attn.o_proj
model.layers.10.self_attn.qkv_proj
model.layers.10.self_attn.rotary_emb
model.layers.10.mlp
model.layers.10.mlp.gate_up_proj
model.layers.10.mlp.down_proj
model.layers.10.mlp.activation_fn
model.layers.10.input_layernorm
model.layers.10.resid_attn_dropout
model.layers.10.resid_mlp_dropout
model.layers.10.post_attention_layernorm
model.layers.11
model.layers.11.self_attn
model.layers.11.self_attn.o_proj
model.layers.11.self_attn.qkv_proj
model.layers.11.self_attn.rotary_emb
model.layers.11.mlp
model.layers.11.mlp.gate_up_proj
model.layers.11.mlp.down_proj
model.layers.11.mlp.activation_fn
model.layers.11.input_layernorm
model.layers.11.resid_attn_dropout
model.layers.11.resid_mlp_dropout
model.layers.11.post_attention_layernorm
model.layers.12
model.layers.12.self_attn
model.layers.12.self_attn.o_proj
model.layers.12.self_attn.qkv_proj
model.layers.12.self_attn.rotary_emb
model.layers.12.mlp
model.layers.12.mlp.gate_up_proj
model.layers.12.mlp.down_proj
model.layers.12.mlp.activation_fn
model.layers.12.input_layernorm
model.layers.12.resid_attn_dropout
model.layers.12.resid_mlp_dropout
model.layers.12.post_attention_layernorm
model.layers.13
model.layers.13.self_attn
model.layers.13.self_attn.o_proj
model.layers.13.self_attn.qkv_proj
model.layers.13.self_attn.rotary_emb
model.layers.13.mlp
model.layers.13.mlp.gate_up_proj
model.layers.13.mlp.down_proj
model.layers.13.mlp.activation_fn
model.layers.13.input_layernorm
model.layers.13.resid_attn_dropout
model.layers.13.resid_mlp_dropout
model.layers.13.post_attention_layernorm
model.layers.14
model.layers.14.self_attn
model.layers.14.self_attn.o_proj
model.layers.14.self_attn.qkv_proj
model.layers.14.self_attn.rotary_emb
model.layers.14.mlp
model.layers.14.mlp.gate_up_proj
model.layers.14.mlp.down_proj
model.layers.14.mlp.activation_fn
model.layers.14.input_layernorm
model.layers.14.resid_attn_dropout
model.layers.14.resid_mlp_dropout
model.layers.14.post_attention_layernorm
model.layers.15
model.layers.15.self_attn
model.layers.15.self_attn.o_proj
model.layers.15.self_attn.qkv_proj
model.layers.15.self_attn.rotary_emb
model.layers.15.mlp
model.layers.15.mlp.gate_up_proj
model.layers.15.mlp.down_proj
model.layers.15.mlp.activation_fn
model.layers.15.input_layernorm
model.layers.15.resid_attn_dropout
model.layers.15.resid_mlp_dropout
model.layers.15.post_attention_layernorm
model.layers.16
model.layers.16.self_attn
model.layers.16.self_attn.o_proj
model.layers.16.self_attn.qkv_proj
model.layers.16.self_attn.rotary_emb
model.layers.16.mlp
model.layers.16.mlp.gate_up_proj
model.layers.16.mlp.down_proj
model.layers.16.mlp.activation_fn
model.layers.16.input_layernorm
model.layers.16.resid_attn_dropout
model.layers.16.resid_mlp_dropout
model.layers.16.post_attention_layernorm
model.layers.17
model.layers.17.self_attn
model.layers.17.self_attn.o_proj
model.layers.17.self_attn.qkv_proj
model.layers.17.self_attn.rotary_emb
model.layers.17.mlp
model.layers.17.mlp.gate_up_proj
model.layers.17.mlp.down_proj
model.layers.17.mlp.activation_fn
model.layers.17.input_layernorm
model.layers.17.resid_attn_dropout
model.layers.17.resid_mlp_dropout
model.layers.17.post_attention_layernorm
model.layers.18
model.layers.18.self_attn
model.layers.18.self_attn.o_proj
model.layers.18.self_attn.qkv_proj
model.layers.18.self_attn.rotary_emb
model.layers.18.mlp
model.layers.18.mlp.gate_up_proj
model.layers.18.mlp.down_proj
model.layers.18.mlp.activation_fn
model.layers.18.input_layernorm
model.layers.18.resid_attn_dropout
model.layers.18.resid_mlp_dropout
model.layers.18.post_attention_layernorm
model.layers.19
model.layers.19.self_attn
model.layers.19.self_attn.o_proj
model.layers.19.self_attn.qkv_proj
model.layers.19.self_attn.rotary_emb
model.layers.19.mlp
model.layers.19.mlp.gate_up_proj
model.layers.19.mlp.down_proj
model.layers.19.mlp.activation_fn
model.layers.19.input_layernorm
model.layers.19.resid_attn_dropout
model.layers.19.resid_mlp_dropout
model.layers.19.post_attention_layernorm
model.layers.20
model.layers.20.self_attn
model.layers.20.self_attn.o_proj
model.layers.20.self_attn.qkv_proj
model.layers.20.self_attn.rotary_emb
model.layers.20.mlp
model.layers.20.mlp.gate_up_proj
model.layers.20.mlp.down_proj
model.layers.20.mlp.activation_fn
model.layers.20.input_layernorm
model.layers.20.resid_attn_dropout
model.layers.20.resid_mlp_dropout
model.layers.20.post_attention_layernorm
model.layers.21
model.layers.21.self_attn
model.layers.21.self_attn.o_proj
model.layers.21.self_attn.qkv_proj
model.layers.21.self_attn.rotary_emb
model.layers.21.mlp
model.layers.21.mlp.gate_up_proj
model.layers.21.mlp.down_proj
model.layers.21.mlp.activation_fn
model.layers.21.input_layernorm
model.layers.21.resid_attn_dropout
model.layers.21.resid_mlp_dropout
model.layers.21.post_attention_layernorm
model.layers.22
model.layers.22.self_attn
model.layers.22.self_attn.o_proj
model.layers.22.self_attn.qkv_proj
model.layers.22.self_attn.rotary_emb
model.layers.22.mlp
model.layers.22.mlp.gate_up_proj
model.layers.22.mlp.down_proj
model.layers.22.mlp.activation_fn
model.layers.22.input_layernorm
model.layers.22.resid_attn_dropout
model.layers.22.resid_mlp_dropout
model.layers.22.post_attention_layernorm
model.layers.23
model.layers.23.self_attn
model.layers.23.self_attn.o_proj
model.layers.23.self_attn.qkv_proj
model.layers.23.self_attn.rotary_emb
model.layers.23.mlp
model.layers.23.mlp.gate_up_proj
model.layers.23.mlp.down_proj
model.layers.23.mlp.activation_fn
model.layers.23.input_layernorm
model.layers.23.resid_attn_dropout
model.layers.23.resid_mlp_dropout
model.layers.23.post_attention_layernorm
model.layers.24
model.layers.24.self_attn
model.layers.24.self_attn.o_proj
model.layers.24.self_attn.qkv_proj
model.layers.24.self_attn.rotary_emb
model.layers.24.mlp
model.layers.24.mlp.gate_up_proj
model.layers.24.mlp.down_proj
model.layers.24.mlp.activation_fn
model.layers.24.input_layernorm
model.layers.24.resid_attn_dropout
model.layers.24.resid_mlp_dropout
model.layers.24.post_attention_layernorm
model.layers.25
model.layers.25.self_attn
model.layers.25.self_attn.o_proj
model.layers.25.self_attn.qkv_proj
model.layers.25.self_attn.rotary_emb
model.layers.25.mlp
model.layers.25.mlp.gate_up_proj
model.layers.25.mlp.down_proj
model.layers.25.mlp.activation_fn
model.layers.25.input_layernorm
model.layers.25.resid_attn_dropout
model.layers.25.resid_mlp_dropout
model.layers.25.post_attention_layernorm
model.layers.26
model.layers.26.self_attn
model.layers.26.self_attn.o_proj
model.layers.26.self_attn.qkv_proj
model.layers.26.self_attn.rotary_emb
model.layers.26.mlp
model.layers.26.mlp.gate_up_proj
model.layers.26.mlp.down_proj
model.layers.26.mlp.activation_fn
model.layers.26.input_layernorm
model.layers.26.resid_attn_dropout
model.layers.26.resid_mlp_dropout
model.layers.26.post_attention_layernorm
model.layers.27
model.layers.27.self_attn
model.layers.27.self_attn.o_proj
model.layers.27.self_attn.qkv_proj
model.layers.27.self_attn.rotary_emb
model.layers.27.mlp
model.layers.27.mlp.gate_up_proj
model.layers.27.mlp.down_proj
model.layers.27.mlp.activation_fn
model.layers.27.input_layernorm
model.layers.27.resid_attn_dropout
model.layers.27.resid_mlp_dropout
model.layers.27.post_attention_layernorm
model.layers.28
model.layers.28.self_attn
model.layers.28.self_attn.o_proj
model.layers.28.self_attn.qkv_proj
model.layers.28.self_attn.rotary_emb
model.layers.28.mlp
model.layers.28.mlp.gate_up_proj
model.layers.28.mlp.down_proj
model.layers.28.mlp.activation_fn
model.layers.28.input_layernorm
model.layers.28.resid_attn_dropout
model.layers.28.resid_mlp_dropout
model.layers.28.post_attention_layernorm
model.layers.29
model.layers.29.self_attn
model.layers.29.self_attn.o_proj
model.layers.29.self_attn.qkv_proj
model.layers.29.self_attn.rotary_emb
model.layers.29.mlp
model.layers.29.mlp.gate_up_proj
model.layers.29.mlp.down_proj
model.layers.29.mlp.activation_fn
model.layers.29.input_layernorm
model.layers.29.resid_attn_dropout
model.layers.29.resid_mlp_dropout
model.layers.29.post_attention_layernorm
model.layers.30
model.layers.30.self_attn
model.layers.30.self_attn.o_proj
model.layers.30.self_attn.qkv_proj
model.layers.30.self_attn.rotary_emb
model.layers.30.mlp
model.layers.30.mlp.gate_up_proj
model.layers.30.mlp.down_proj
model.layers.30.mlp.activation_fn
model.layers.30.input_layernorm
model.layers.30.resid_attn_dropout
model.layers.30.resid_mlp_dropout
model.layers.30.post_attention_layernorm
model.layers.31
model.layers.31.self_attn
model.layers.31.self_attn.o_proj
model.layers.31.self_attn.qkv_proj
model.layers.31.self_attn.rotary_emb
model.layers.31.mlp
model.layers.31.mlp.gate_up_proj
model.layers.31.mlp.down_proj
model.layers.31.mlp.activation_fn
model.layers.31.input_layernorm
model.layers.31.resid_attn_dropout
model.layers.31.resid_mlp_dropout
model.layers.31.post_attention_layernorm
model.norm
lm_head

So I tried these based on the set_atten layers being used --target-modules o_proj,qkv_proj \

Still working on this..

Let us know how this develops, I am looking into fine-tuning this with peft also

@perelloliveri looked more into it, the modules are related to the attention blocks:

model.layers.31.self_attn.o_proj
model.layers.31.self_attn.qkv_proj

So adding --target-modules o_proj,qkv_proj

Runs it and trains it.

meowman changed discussion status to closed

Would you share the complete code to fine-tune PHI-3, please?

@midesk just use autotrain-advanced Google Colab

@midesk If it's helpful, I just followed this simple PEFT tutorial and it works well https://huggingface.co/docs/peft/en/index

Sign up or log in to comment