Deployment to Inference Endpoints failed
This error is similar to what being faced with llama-3-1, the endpoint creation on Hub fails. I had previously deployed the model on dedicated endpoint and it was working fine, now its not working.
Exit code: 1. Reason: /opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 610, in get_model\n raise NotImplementedError("sharded is not supported for AutoModel")\n\nNotImplementedError: sharded is not supported for AutoModel\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
{"timestamp":"2024-08-07T08:25:52.910377Z","level":"ERROR","fields":{"message":"Shard 3 failed to start"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-07T08:25:52.910402Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
{"timestamp":"2024-08-07T08:25:52.912665Z","level":"INFO","fields":{"message":"Terminating shard"},"target":"text_generation_launcher","span":{"rank":2,"name":"shard-manager"},"spans":[{"rank":2,"name":"shard-manager"}]}
{"timestamp":"2024-08-07T08:25:52.912699Z","level":"INFO","fields":{"message":"Waiting for shard to gracefully shutdown"},"target":"text_generation_launcher","span":{"rank":2,"name":"shard-manager"},"spans":[{"rank":2,"name":"shard-manager"}]}
{"timestamp":"2024-08-07T08:25:52.912698Z","level":"INFO","fields":{"message":"Terminating shard"},"target":"text_generation_launcher","span":{"rank":1,"name":"shard-manager"},"spans":[{"rank":1,"name":"shard-manager"}]}
{"timestamp":"2024-08-07T08:25:52.912712Z","level":"INFO","fields":{"message":"Waiting for shard to gracefully shutdown"},"target":"text_generation_launcher","span":{"rank":1,"name":"shard-manager"},"spans":[{"rank":1,"name":"shard-manager"}]}
{"timestamp":"2024-08-07T08:25:53.012880Z","level":"INFO","fields":{"message":"shard terminated"},"target":"text_generation_launcher","span":{"rank":1,"name":"shard-manager"},"spans":[{"rank":1,"name":"shard-manager"}]}
{"timestamp":"2024-08-07T08:25:53.012883Z","level":"INFO","fields":{"message":"shard terminated"},"target":"text_generation_launcher","span":{"rank":2,"name":"shard-manager"},"spans":[{"rank":2,"name":"shard-manager"}]}
Error: ShardCannotStart
I tried GPU · Nvidia L4 · 4x GPUs · 96 GB, with TGI, on all settings left on default. Can someone else facing this issue too? Any help is appreciated.
Thanks
I still can't deploy this to inference endpoints. Does this require flash attention with Ampere GPU? It can't fit within 24GB if that's the case and the next tier up is 96GB, which is too expensive