text generation inference github issue?

by mgunther - opened Aug 28, 2023

Aug 28, 2023

Is there an open issue on https://github.com/huggingface/text-generation-inference tracking the sharded 16 bit floating point problem?

andreaskoepf

OpenAssistant org Aug 28, 2023

No, not yet. Most likely it is related to the vocabulary size of 32007 .. unfortunately I noticed it too late, it should at least have been rounded number divisible by 16.
I heard from others that TGI didn't start and gave them the error "The choosen size 32007 is not compatible with sharding on 4 shards". caused by an assert added by this commit: https://github.com/huggingface/text-generation-inference/commit/67347950b7518efeb64c7f99ee360af685b53934

mgunther

Aug 28, 2023

Ah. Thanks for the update. I didn't get the assertion error you mentioned. That is, the model ran for me on when using the docker iamge ghcr.io/huggingface/text-generation-inference:1.0.1. But the responses to questions like "Can you write a python script to calculate Fibonacci sequence?" were only correct in python syntax and english grammar, but not in code algorithm or conversation flow. (E.g. it would answer its own questions) . (For reference, the pythia 12b oasst model gets the python fibonacci sequence correct. )

andreaskoepf

OpenAssistant org Aug 28, 2023

You could try with nf4 quantization and --sharded false ...

andreaskoepf

OpenAssistant org Aug 29, 2023

@mgunther I uploaded a new version with resized embedding & lm_head layers which now have a size divisible by 128. Could you please try again if it now works for you?

mgunther

Aug 29, 2023

Yes, the nf4 quantization with no sharding works better, but haven't extensively tested. Currently downloading the new model. Will follow up once tested.

The vllm inference server mentioned in the model card ( https://github.com/vllm-project/vllm ) gets python fibonacci sequence code correct.

mgunther

Aug 29, 2023

It looks like the new oasst lamma 2 model is working well with the tgi docker image 1.0.2 and sharding. ( gets py fib script correct).

Thanks Andreas!

andreaskoepf

OpenAssistant org Aug 29, 2023

Great, thanks for testing! I will update the readme.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment