generate_text() leads to `RuntimeError: probability tensor contains either `inf`, `nan` or element < 0 - ` using `load_in_8bit`

#77
by deepthoughts - opened

Using GCP P100 with 16GB memory, I get the following error when calling generate_text() when loading the model using load_in_8bit.

/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:230: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (Triggered internally at ../aten/src/ATen/native/TensorCompare.cpp:413.)
  attn_scores = torch.where(causal_mask, attn_scores, mask_value)
Traceback (most recent call last):
  File "/home/admin/dolly-v2-12b/main.py", line 33, in <module>
    result = generate_text(prompt)
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1120, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1127, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1026, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/admin/dolly-v2-12b/instruct_pipeline.py", line 132, in _forward
    generated_sequence = self.model.generate(
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 1572, in generate
    return self.sample(
  File "/opt/conda/envs/llm/lib/python3.10/site-packages/transformers/generation/utils.py", line 2655, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

I use the code below:

import torch

modelPath = "/home/admin/dolly-v2-12b"

from instruct_pipeline import InstructionTextGenerationPipeline

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(modelPath, padding_side="left")

model = AutoModelForCausalLM.from_pretrained(modelPath, device_map="auto", load_in_8bit=True, torch_dtype=torch.float16)

generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer, torch_dtype=torch.float16)

result = generate_text("Test prompt")

Databricks org

That just means 8 bit quantization failed for this input. Probably will not work out here. Use the 7B model in 16bit.

srowen changed discussion status to closed

🌑 Have you tried increasing the temperature?

Well try increasing the temperature value. I had very low temperature value along with other parameters such as top_k and top_p which made the next token distribution too steep and as the beam search's logic, you will need to have multiple tokens available, and in the low temperature case I couldn't have (because we know how temperature works, right?)

So I increased the temperature and it worked.

Try increasing the temp value and it should just work, if there are no other complexity involved.

Sign up or log in to comment