why
It may be related to your hardware, and in addition, you can try the following inference framework for acceleration, such as vllm
· Sign up or log in to comment