in fine tuning the model begin with zero loss.

#85
by Imran1 - opened

`
import peft

from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

lora_config = LoraConfig(
r=32,
lora_alpha=16,
target_modules=[
'q_proj',
'k_proj',
'v_proj',
'dense',
'fc1',
'fc2',

  ],
bias="none",
lora_dropout=0.05,
task_type="CAUSAL_LM",

)

import transformers
from transformers import TrainingArguments
import torch

HAS_BFLOAT16 = torch.cuda.is_bf16_supported()

training_args = TrainingArguments(
output_dir= "phib",
max_steps = 100,
per_device_train_batch_size= 1,
gradient_accumulation_steps= 4,
optim="paged_adamw_32bit",
warmup_steps = 10,
logging_steps = 1,
logging_strategy="steps",
learning_rate = 2e-4,
fp16 = not HAS_BFLOAT16,
bf16 = HAS_BFLOAT16,
weight_decay = 0.01,
lr_scheduler_type = "linear",
group_by_length= True,
#disable_tqdm=False,
report_to="none",
seed = 3407,
)
`

check the lose

Step Training Loss 1 0.000000 2 0.000000 3 0.000000 4 0.000000 5 0.000000 6 0.000000 7 0.000000

Got the same issue on similar settings

Microsoft org

Could you please try with microsoft/phi-1_5 and report if you are seing the same issue?

Can't try that right now, it looks like this rev "refs/pr/23" is working. The lora total number of trainable parameters are somehow 2 time higher as previous while conserving the same setting. I am wondering if this is supposed to be so (refs/pr/23 vs latest(Jan 16)) .

Microsoft org

Could you please re-run with the latest update?

We updated the modeling_phi.py file and disabled the auto-casting on the Attention layer. This is the same fix as the previous code had.

@gugarosa

Could you please re-run with the latest update?

great, that works fine. Thanks

Microsoft org

No problems! Please let me know if you see anything else.

gugarosa changed discussion status to closed

Sign up or log in to comment