Spaces:

alexshengzhili
/

calahealthgpt

Runtime error

App Files Files Community

calahealthgpt / docs /openai_api.md

alexshengzhili

Upload folder using huggingface_hub

e72aedf over 1 year ago

preview code

raw

history blame contribute delete

3.7 kB

	# OpenAI-Compatible RESTful APIs & SDK

	FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs.
	The FastChat server is compatible with both [openai-python](https://github.com/openai/openai-python) library and cURL commands.

	The following OpenAI APIs are supported:
	- Chat Completions. (Reference: https://platform.openai.com/docs/api-reference/chat)
	- Completions. (Reference: https://platform.openai.com/docs/api-reference/completions)
	- Embeddings. (Reference: https://platform.openai.com/docs/api-reference/embeddings)

	## RESTful API Server
	First, launch the controller

	```bash
	python3 -m fastchat.serve.controller
	```

	Then, launch the model worker(s)

	```bash
	python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.3
	```

	Finally, launch the RESTful API server

	```bash
	python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
	```

	Now, let us test the API server.

	### OpenAI Official SDK
	The goal of `openai_api_server.py` is to implement a fully OpenAI-compatible API server, so the models can be used directly with [openai-python](https://github.com/openai/openai-python) library.

	First, install openai-python:
	```bash
	pip install --upgrade openai
	```

	Then, interact with model vicuna:
	```python
	import openai
	openai.api_key = "EMPTY" # Not support yet
	openai.api_base = "http://localhost:8000/v1"

	model = "vicuna-7b-v1.3"
	prompt = "Once upon a time"

	# create a completion
	completion = openai.Completion.create(model=model, prompt=prompt, max_tokens=64)
	# print the completion
	print(prompt + completion.choices[0].text)

	# create a chat completion
	completion = openai.ChatCompletion.create(
	model=model,
	messages=[{"role": "user", "content": "Hello! What is your name?"}]
	)
	# print the completion
	print(completion.choices[0].message.content)
	```

	Streaming is also supported. See [test_openai_api.py](../tests/test_openai_api.py).

	### cURL
	cURL is another good tool for observing the output of the api.

	List Models:
	```bash
	curl http://localhost:8000/v1/models
	```

	Chat Completions:
	```bash
	curl http://localhost:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "vicuna-7b-v1.3",
	"messages": [{"role": "user", "content": "Hello! What is your name?"}]
	}'
	```

	Text Completions:
	```bash
	curl http://localhost:8000/v1/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "vicuna-7b-v1.3",
	"prompt": "Once upon a time",
	"max_tokens": 41,
	"temperature": 0.5
	}'
	```

	Embeddings:
	```bash
	curl http://localhost:8000/v1/embeddings \
	-H "Content-Type: application/json" \
	-d '{
	"model": "vicuna-7b-v1.3",
	"input": "Hello world!"
	}'
	```

	## LangChain Support
	This OpenAI-compatible API server supports LangChain. See [LangChain Integration](langchain_integration.md) for details.

	## Adjusting Environment Variables

	### Timeout
	By default, a timeout error will occur if a model worker does not response within 100 seconds. If your model/hardware is slower, you can change this timeout through an environment variable:

	```bash
	export FASTCHAT_WORKER_API_TIMEOUT=<larger timeout in seconds>
	```

	### Batch size
	If you meet the following OOM error while creating embeddings. You can use a smaller batch size by setting

	```bash
	export FASTCHAT_WORKER_API_EMBEDDING_BATCH_SIZE=1
	```

	## Todos
	Some features to be implemented:

	- [ ] Support more parameters like `logprobs`, `logit_bias`, `user`, `presence_penalty` and `frequency_penalty`
	- [ ] Model details (permissions, owner and create time)
	- [ ] Edits API
	- [ ] Authentication and API key
	- [ ] Rate Limitation Settings