databricks/dolly-v2-12b · generate_text() leads to `RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Jun 19, 2023

Using GCP P100 with 16GB memory, I get the following error when calling generate_text() when loading the model using load_in_8bit.

/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:230: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (Triggered internally at ../aten/src/ATen/native/TensorCompare.cpp:413.)
  attn_scores = torch.where(causal_mask, attn_scores, mask_value)
Traceback (most recent call last):
  File "/home/admin/dolly-v2-12b/main.py", line 33, in <module>
    result = generate_text(prompt)
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1120, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1127, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1026, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/admin/dolly-v2-12b/instruct_pipeline.py", line 132, in _forward
    generated_sequence = self.model.generate(
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1572, in generate
    return self.sample(
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2655, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

I use the code below:

import torch

modelPath = "/home/admin/dolly-v2-12b"

from instruct_pipeline import InstructionTextGenerationPipeline

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(modelPath, padding_side="left")

model = AutoModelForCausalLM.from_pretrained(modelPath, device_map="auto", load_in_8bit=True, torch_dtype=torch.float16)

generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer, torch_dtype=torch.float16)

result = generate_text("Test prompt")

srowen

Databricks org Jun 19, 2023

That just means 8 bit quantization failed for this input. Probably will not work out here. Use the 7B model in 16bit.

srowen changed discussion status to closed Jun 24, 2023

AayushShah

Sep 28, 2023

🌡 Have you tried increasing the temperature?

Well try increasing the temperature value. I had very low temperature value along with other parameters such as top_k and top_p which made the next token distribution too steep and as the beam search's logic, you will need to have multiple tokens available, and in the low temperature case I couldn't have (because we know how temperature works, right?)

So I increased the temperature and it worked.

Try increasing the temp value and it should just work, if there are no other complexity involved.

databricks
/

dolly-v2-12b

generate_text() leads to `RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 - ` using `load_in_8bit`

🌡 Have you tried increasing the temperature?