Use openai for function calling
vLLM can expose an openai like api and can I just ues the openai to make function calling? Right now I have a try like this, and it failed.
import json
# Example dummy function hard coded to return the same weather
# In production, this could be your backend API or an external API
def get_current_weather(location, unit="fahrenheit"):
"""Get the current weather in a given location"""
weather_info = {
"location": location,
"temperature": "72",
"unit": unit,
"forecast": ["sunny", "windy"],
}
return json.dumps(weather_info)
# define a function
functions = [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
}
]
client = OpenAI(api_key='abc', base_url='<customized-enpoint>')
response = client.chat.completions.create(messages=messages, functions=functions, model='Llama-2-7b-chat-hf-function-calling-v3')
The result looks like this:
The weather in Boston can vary depending on the time of year. Boston has a humid continental climate, with cold winters and hot, humid summers. Here are some general weather patterns in Boston:
Winter (December to February):
* Temperatures can range from 20°F (-7°C) to 40°F (4°C)
* Snowfall is common, with an average of 43 inches (109 cm) per year
* The coldest months are January and February, with average temperatures around 28°F (-2°C)
Spring (March to May):
* Temperatures can range from 30°F (-1°C) to 60°F (16°C)
* Spring is a transition season, with temperatures gradually warming up
* April is usually the rainiest month, with an average of 4 inches (10 cm) of rain
Summer (June to August):
* Temperatures can range from 60°F (16°C) to 80°F (27°C)
* Summer is the warmest season, with an average temperature of 70°F (21°C) in July and August
* Humidity is usually high during the summer months
Fall (September to November):
* Temperatures can range from 50°F (10°C) to 60°F (16°C)
* Fall is a mild season, with temperatures gradually cooling down
* October is usually the driest month, with an average of 3 inches (7 cm) of rain
It's worth noting that weather patterns can vary from year to year, and it's not uncommon for Boston to experience extreme weather events such as heatwaves, cold snaps, or heavy snowfall.
Howdy, I see what you're doing and it's a good idea.
Yes, this model works with vLLM and an openai endpoint, but you have to feed in the chat template from the tokenizer (which is custom for this model).
The format does not follow the openai function calling format (I'd like to do that but the function calling implementation by openai hasn't been stable and has used a "functions" and also a "tools" approach). Ideally I would make all models compatible fully - even with openai function calling, but what we have now is quasi compatibility - by feeding in the chat template. Check out my videos on YouTube.com/@TrelisResearch for the latest function calling video and also the latest inference vid.
There is a quite interesting feature in together.ai https://docs.together.ai/docs/function-calling
yeah I think full openai compatibility makes good sense.
The challenge is that TGI and vLLM don't fully support that syntax, so the best way (at least I could think of) is to support indirectly by setting up a chat_template, and passing that into TGI or vLLM.
Do you see a better way?
(Together are offering endpoints (not inferencing your own model) - so they have control over the endpoint specification).
Yes, fully support of openai is quite good. Because lots of client is using openai by default.
So you mean a chat template can solve this problem? If that is true, right now vLLM can send a customize chat template https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server
python -m vllm.entrypoints.openai.api_server \
--model facebook/opt-125m \
--chat-template ./examples/template_chatml.jinja
I don't think give a special chat template make the default Mistral model support function calling...So together.ai may give some finetuning to the default model like what you did for llama2? But in their example, it seems that they are USING the default models...which is quite interesting...
response = client.chat.completions.create(
model="mistralai/Mixtral-8x7B-Instruct-v0.1", # <----- looks like a default model
messages=messages,
tools=tools,
tool_choice="auto",
)
Yup, if you looked at the Trelis v3 models they have a custom chat template. That chat template is sent to vLLM when using the ADVANCED-inference repo. Same for TGI.
Re Together, yes, would be interesting if they are able to get the model to zero shot function calling. I tried with Mixtral and the results were poor.