generate_text() leads to `RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 - ` using `load_in_8bit`
Using GCP P100 with 16GB memory, I get the following error when calling generate_text() when loading the model using load_in_8bit
.
/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:230: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (Triggered internally at ../aten/src/ATen/native/TensorCompare.cpp:413.)
attn_scores = torch.where(causal_mask, attn_scores, mask_value)
Traceback (most recent call last):
File "/home/admin/dolly-v2-12b/main.py", line 33, in <module>
result = generate_text(prompt)
File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1120, in __call__
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1127, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1026, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/home/admin/dolly-v2-12b/instruct_pipeline.py", line 132, in _forward
generated_sequence = self.model.generate(
File "/opt/conda/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1572, in generate
return self.sample(
File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2655, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
I use the code below:
import torch
modelPath = "/home/admin/dolly-v2-12b"
from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(modelPath, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(modelPath, device_map="auto", load_in_8bit=True, torch_dtype=torch.float16)
generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer, torch_dtype=torch.float16)
result = generate_text("Test prompt")
That just means 8 bit quantization failed for this input. Probably will not work out here. Use the 7B model in 16bit.
π‘ Have you tried increasing the temperature?
Well try increasing the temperature
value. I had very low temperature value along with other parameters such as top_k
and top_p
which made the next token distribution too steep and as the beam search's logic, you will need to have multiple tokens available, and in the low temperature case I couldn't have (because we know how temperature works, right?)
So I increased the temperature and it worked.
Try increasing the temp value and it should just work, if there are no other complexity involved.