Doesn't work with TGI container
#1
by
junli8848
- opened
Runs the model by using TGI container with the quantize parameter "awq", but got the following errors:
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/awq/quantize/qmodule.py", line 46, in forward
out = awq_inference_engine.gemm_forward_cuda(
RuntimeError: expected scalar type Half but found BFloat16