Doesn't work with TGI container

#1
by junli8848 - opened

Runs the model by using TGI container with the quantize parameter "awq", but got the following errors:

  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/awq/quantize/qmodule.py", line 46, in forward
    out = awq_inference_engine.gemm_forward_cuda(
RuntimeError: expected scalar type Half but found BFloat16

Sign up or log in to comment