使用llma.cpp server.exe 运行没有任何输出,使用main.exe可以正常输出
#4
by
weiboboer
- opened
main.exe -m qwen1_5-14b-chat-q4_0.gguf -n 512 --color -i -cml 该方式可行
server.exe -m qwen1_5-14b-chat-q4_0.gguf 该方式调用http://localhost:8080/completion一直无任何输出
我在github也看到类似的问题,但是没有解决方案
https://github.com/ggerganov/llama.cpp/issues/4821
调用url应该是
http://localhost:8080/v1
某些api中你可能需要尝试
http://localhost:8080/v1/chat/completions/
以下是简单示例:
import openai
client = openai.OpenAI(
base_url="http://localhost:8080/v1",
api_key="-",
)
completion = client.chat.completions.create(
model="gpt-3.5-turbo",
temperature=0,
messages=[
{"role": "system", "content": "请始终用中文回复"},
{"role": "user", "content": "Tell a joke about summer"},
],
)
print(completion.choices[0].message.content)