Text Generation
Transformers
PyTorch
English
llama
sft
Inference Endpoints
text-generation-inference

text generation inference github issue?

#3
by mgunther - opened

Is there an open issue on https://github.com/huggingface/text-generation-inference tracking the sharded 16 bit floating point problem?

OpenAssistant org

No, not yet. Most likely it is related to the vocabulary size of 32007 .. unfortunately I noticed it too late, it should at least have been rounded number divisible by 16.
I heard from others that TGI didn't start and gave them the error "The choosen size 32007 is not compatible with sharding on 4 shards". caused by an assert added by this commit: https://github.com/huggingface/text-generation-inference/commit/67347950b7518efeb64c7f99ee360af685b53934

Ah. Thanks for the update. I didn't get the assertion error you mentioned. That is, the model ran for me on when using the docker iamge ghcr.io/huggingface/text-generation-inference:1.0.1. But the responses to questions like "Can you write a python script to calculate Fibonacci sequence?" were only correct in python syntax and english grammar, but not in code algorithm or conversation flow. (E.g. it would answer its own questions) . (For reference, the pythia 12b oasst model gets the python fibonacci sequence correct. )

OpenAssistant org

You could try with nf4 quantization and --sharded false ...

OpenAssistant org

@mgunther I uploaded a new version with resized embedding & lm_head layers which now have a size divisible by 128. Could you please try again if it now works for you?

Yes, the nf4 quantization with no sharding works better, but haven't extensively tested. Currently downloading the new model. Will follow up once tested.

The vllm inference server mentioned in the model card ( https://github.com/vllm-project/vllm ) gets python fibonacci sequence code correct.

It looks like the new oasst lamma 2 model is working well with the tgi docker image 1.0.2 and sharding. ( gets py fib script correct).

Thanks Andreas!

OpenAssistant org

Great, thanks for testing! I will update the readme.

Sign up or log in to comment