RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

#4
by annamalai-s - opened

I'm working on a task to connect database to opensource LLM using langchain. I'm trying to connect quantized falcon-7b to SQLDatabase chain. This is how I have loaded the model from hugging face
repo,

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
model = AutoGPTQForCausalLM.from_quantized(model_id, device="cpu", use_triton=False, use_safetensors=True, torch_dtype=torch.float16, trust_remote_code=True)

Note: I have also changed dtype to floatb16, float32 in model class

Screenshot from 2023-12-11 16-38-13.png
pipe = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float32,
trust_remote_code=True,
device_map='auto',
max_new_tokens=2649,
)

When I run the chain with the input prompt, the code throws the following error,
chain.run('How many employees are there?')

File "/home/user/Disk/Langchain/env/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 196, in forward
return F.layer_norm(
File "/home/user/Disk/Langchain/env/lib/python3.10/site-packages/torch/nn/functional.py", line 2543, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

These are the package versions I'm using,
langchain == 0.0.348
langchain-core == 0.0.12
langchain-experimental == 0.0.44
langsmith == 0.0.69
transformers == 4.33.0
accelerate == 0.25.0
auto-gptq == 0.5.1

Could someone help with is problem

Sign up or log in to comment