model.generate is throwing AttributeError: 'HybridCache' object has no attribute 'float'

#18

by saireddy - opened Jul 3, 2024

Jul 3, 2024

I am using transformers 4.42.3 and trying to full-fine tune gemma2-9B model using super vised fine tuning, when I try to do
outputs = model.generate(input_ids=input_ids, max_new_tokens=150,
do_sample=True, top_p=0.6, temperature=0.5,
pad_token_id=tokenizer.eos_token_id)
This is throwing ""AttributeError: 'HybridCache' object has no attribute 'float'"", but trainer.train() didnt throw any errors.

spow12

Jul 4, 2024

I got similar error in torch 2.0.1 and cuda 11.7

when i tried torch 2.1.1 and cuda 12.1, it works.

saireddy

Jul 11, 2024

i am using nvcr.io/nvidia/pytorch:24.05-py3 image which has PyTorch 2.4.0a0+07cecf4 and NVIDIA CUDA 12.4.1 still hitting the same issue

beomi

Jul 15, 2024

model.generate(
    **tokenizer("### Q: What is the capital city of the Korea?", return_tensors='pt', return_token_type_ids=False).to('cuda'),
    do_sample=True,
    use_cache=False,
    max_new_tokens=256,
    eos_token_id=1,
)

simple trick workaround, with use_cache=False, for now.

saireddy

Jul 16, 2024

•

edited Jul 16, 2024

Thanks for the trick @beomi , now I am hitting another issue
RuntimeError: Index put requires the source and destination dtypes match, got BFloat16 for the destination and Float for the source.
but I dont run into this issue with other LLM's, not sure whats wrong here. Any advise?

here is the base code

BitsAndBytesConfig int-4 config

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(model_id,
quantization_config=bnb_config,
use_cache=False,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16,
device_map="auto", token=access_token)
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
target_modules=["q_proj", "v_proj"],
r=64,
bias="none",
task_type="CAUSAL_LM",
)
prepare model for training

model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
from transformers import TrainingArguments
args = TrainingArguments(
output_dir=output_dir,
num_train_epochs=15,
per_device_train_batch_size=8,
gradient_accumulation_steps=2,
gradient_checkpointing=True,

optim="paged_adamw_32bit",
logging_steps=100,
save_strategy="epoch",
learning_rate=2e-4,
bf16=True,
tf32=True,
max_grad_norm=0.3,
warmup_ratio=0.03,
seed=42,
eval_steps=100,
lr_scheduler_type="cosine",
evaluation_strategy='epoch',
disable_tqdm=False,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",

run_name=run_name # disable tqdm since with packing values are in correct

)
from trl import SFTTrainer

max_seq_length = 2048 # max sequence length for model and packing of the dataset

trainer = SFTTrainer(
model=model,
peft_config=peft_config,
max_seq_length=max_seq_length,
tokenizer=tokenizer,
packing=True,
formatting_func=generate_prompt, # this will aplly the create_prompt mapping to all training and test dataset
args=args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"]
)
trainer.train()

beomi

Jul 17, 2024

@saireddy Could you confirm the version of the Transfomers/TRL/Accelerate and PyTorch/CUDA version?

saireddy

Jul 17, 2024

Hey @beomi
i am using nvcr.io/nvidia/pytorch:24.05-py3 image which has PyTorch 2.4.0a0+07cecf4 and NVIDIA CUDA 12.4.1

accelerate==0.31.0
bitsandbytes==0.43.1
deepspeed==0.14.4
evaluate==0.4.1
peft==0.11.1
transformers==4.42.3
trl==0.9.4

saireddy

Jul 17, 2024

stacktrace :

outputs = model.generate(input_ids=input_ids, max_new_tokens=max_new_tokens, do_sample=True,
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1491, in generate
outputs = self.base_model.generate(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1914, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2651, in _sample
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 1068, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 908, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 650, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gemma2/modeling_gemma2.py", line 252, in forward
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py", line 1071, in update
return update_fn(
File "/usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py", line 1046, in _static_update
k_out[:, :, cache_position] = key_states
RuntimeError: Index put requires the source and destination dtypes match, got BFloat16 for the destination and Float for the source.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment