calahealthgpt / docs /openai_api.md
alexshengzhili's picture
Upload folder using huggingface_hub
e72aedf
# OpenAI-Compatible RESTful APIs & SDK
FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs.
The FastChat server is compatible with both [openai-python](https://github.com/openai/openai-python) library and cURL commands.
The following OpenAI APIs are supported:
- Chat Completions. (Reference: https://platform.openai.com/docs/api-reference/chat)
- Completions. (Reference: https://platform.openai.com/docs/api-reference/completions)
- Embeddings. (Reference: https://platform.openai.com/docs/api-reference/embeddings)
## RESTful API Server
First, launch the controller
```bash
python3 -m fastchat.serve.controller
```
Then, launch the model worker(s)
```bash
python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.3
```
Finally, launch the RESTful API server
```bash
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
```
Now, let us test the API server.
### OpenAI Official SDK
The goal of `openai_api_server.py` is to implement a fully OpenAI-compatible API server, so the models can be used directly with [openai-python](https://github.com/openai/openai-python) library.
First, install openai-python:
```bash
pip install --upgrade openai
```
Then, interact with model vicuna:
```python
import openai
openai.api_key = "EMPTY" # Not support yet
openai.api_base = "http://localhost:8000/v1"
model = "vicuna-7b-v1.3"
prompt = "Once upon a time"
# create a completion
completion = openai.Completion.create(model=model, prompt=prompt, max_tokens=64)
# print the completion
print(prompt + completion.choices[0].text)
# create a chat completion
completion = openai.ChatCompletion.create(
model=model,
messages=[{"role": "user", "content": "Hello! What is your name?"}]
)
# print the completion
print(completion.choices[0].message.content)
```
Streaming is also supported. See [test_openai_api.py](../tests/test_openai_api.py).
### cURL
cURL is another good tool for observing the output of the api.
List Models:
```bash
curl http://localhost:8000/v1/models
```
Chat Completions:
```bash
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "vicuna-7b-v1.3",
"messages": [{"role": "user", "content": "Hello! What is your name?"}]
}'
```
Text Completions:
```bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "vicuna-7b-v1.3",
"prompt": "Once upon a time",
"max_tokens": 41,
"temperature": 0.5
}'
```
Embeddings:
```bash
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "vicuna-7b-v1.3",
"input": "Hello world!"
}'
```
## LangChain Support
This OpenAI-compatible API server supports LangChain. See [LangChain Integration](langchain_integration.md) for details.
## Adjusting Environment Variables
### Timeout
By default, a timeout error will occur if a model worker does not response within 100 seconds. If your model/hardware is slower, you can change this timeout through an environment variable:
```bash
export FASTCHAT_WORKER_API_TIMEOUT=<larger timeout in seconds>
```
### Batch size
If you meet the following OOM error while creating embeddings. You can use a smaller batch size by setting
```bash
export FASTCHAT_WORKER_API_EMBEDDING_BATCH_SIZE=1
```
## Todos
Some features to be implemented:
- [ ] Support more parameters like `logprobs`, `logit_bias`, `user`, `presence_penalty` and `frequency_penalty`
- [ ] Model details (permissions, owner and create time)
- [ ] Edits API
- [ ] Authentication and API key
- [ ] Rate Limitation Settings