Spaces:
Runtime error
Runtime error
# An OpenedAI API (openai like) | |
This extension creates an API that works kind of like openai (ie. api.openai.com). | |
It's incomplete so far but perhaps is functional enough for you. | |
## Setup & installation | |
Optional (for flask_cloudflared, embeddings): | |
``` | |
pip3 install -r requirements.txt | |
``` | |
It listens on tcp port 5001 by default. You can use the OPENEDAI_PORT environment variable to change this. | |
To enable the bare bones image generation (txt2img) set: SD_WEBUI_URL to point to your Stable Diffusion API ([Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui)). | |
Example: | |
``` | |
SD_WEBUI_URL=http://127.0.0.1:7861 | |
``` | |
### Embeddings (alpha) | |
Embeddings requires ```sentence-transformers``` installed, but chat and completions will function without it loaded. The embeddings endpoint is currently using the HuggingFace model: ```sentence-transformers/all-mpnet-base-v2``` for embeddings. This produces 768 dimensional embeddings (the same as the text-davinci-002 embeddings), which is different from OpenAI's current default ```text-embedding-ada-002``` model which produces 1536 dimensional embeddings. The model is small-ish and fast-ish. This model and embedding size may change in the future. | |
| model name | dimensions | input max tokens | speed | size | Avg. performance | | |
| --- | --- | --- | --- | --- | --- | | |
| text-embedding-ada-002 | 1536 | 8192| - | - | - | | |
| text-davinci-002 | 768 | 2046 | - | - | - | | |
| all-mpnet-base-v2 | 768 | 384 | 2800 | 420M | 63.3 | | |
| all-MiniLM-L6-v2 | 384 | 256 | 14200 | 80M | 58.8 | | |
In short, the all-MiniLM-L6-v2 model is 5x faster, 5x smaller ram, 2x smaller storage, and still offers good quality. Stats from (https://www.sbert.net/docs/pretrained_models.html). To change the model from the default you can set the environment variable OPENEDAI_EMBEDDING_MODEL, ex. "OPENEDAI_EMBEDDING_MODEL=all-MiniLM-L6-v2". | |
Warning: You cannot mix embeddings from different models even if they have the same dimensions. They are not comparable. | |
### Client Application Setup | |
Almost everything you use it with will require you to set a dummy OpenAI API key environment variable. | |
With the [official python openai client](https://github.com/openai/openai-python), you can set the OPENAI_API_BASE environment variable before you import the openai module, like so: | |
``` | |
OPENAI_API_KEY=dummy | |
OPENAI_API_BASE=http://127.0.0.1:5001/v1 | |
``` | |
If needed, replace 127.0.0.1 with the IP/port of your server. | |
If using .env files to save the OPENAI_API_BASE and OPENAI_API_KEY variables, you can ensure compatibility by loading the .env file before loading the openai module, like so in python: | |
``` | |
from dotenv import load_dotenv | |
load_dotenv() | |
import openai | |
``` | |
With the [official Node.js openai client](https://github.com/openai/openai-node) it is slightly more more complex because the environment variables are not used by default, so small source code changes may be required to use the environment variables, like so: | |
``` | |
const openai = OpenAI(Configuration({ | |
apiKey: process.env.OPENAI_API_KEY, | |
basePath: process.env.OPENAI_API_BASE, | |
})); | |
``` | |
For apps made with the [chatgpt-api Node.js client library](https://github.com/transitive-bullshit/chatgpt-api): | |
``` | |
const api = new ChatGPTAPI({ | |
apiKey: process.env.OPENAI_API_KEY, | |
apiBaseUrl: process.env.OPENAI_API_BASE, | |
}) | |
``` | |
## Compatibility & not so compatibility | |
| API endpoint | tested with | notes | | |
| --- | --- | --- | | |
| /v1/models | openai.Model.list() | returns the currently loaded model_name and some mock compatibility options | | |
| /v1/models/{id} | openai.Model.get() | returns whatever you ask for, model does nothing yet anyways | | |
| /v1/text_completion | openai.Completion.create() | the most tested, only supports single string input so far | | |
| /v1/chat/completions | openai.ChatCompletion.create() | depending on the model, this may add leading linefeeds | | |
| /v1/edits | openai.Edit.create() | Assumes an instruction following model, but may work with others | | |
| /v1/images/generations | openai.Image.create() | Bare bones, no model configuration, response_format='b64_json' only. | | |
| /v1/embeddings | openai.Embedding.create() | Using Sentence Transformer, dimensions are different and may never be directly comparable to openai embeddings. | | |
| /v1/moderations | openai.Moderation.create() | does nothing. successfully. | | |
| /v1/engines/\*/... completions, embeddings, generate | python-openai v0.25 and earlier | Legacy engines endpoints | | |
| /v1/images/edits | openai.Image.create_edit() | not supported | | |
| /v1/images/variations | openai.Image.create_variation() | not supported | | |
| /v1/audio/\* | openai.Audio.\* | not supported | | |
| /v1/files\* | openai.Files.\* | not supported | | |
| /v1/fine-tunes\* | openai.FineTune.\* | not supported | | |
The model name setting is ignored in completions, but you may need to adjust the maximum token length to fit the model (ie. set to <2048 tokens instead of 4096, 8k, etc). To mitigate some of this, the max_tokens value is halved until it is less than truncation_length for the model (typically 2k). | |
Streaming, temperature, top_p, max_tokens, stop, should all work as expected, but not all parameters are mapped correctly. | |
Some hacky mappings: | |
| OpenAI | text-generation-webui | note | | |
| --- | --- | --- | | |
| frequency_penalty | encoder_repetition_penalty | this seems to operate with a different scale and defaults, I tried to scale it based on range & defaults, but the results are terrible. hardcoded to 1.18 until there is a better way | | |
| presence_penalty | repetition_penalty | same issues as frequency_penalty, hardcoded to 1.0 | | |
| best_of | top_k | | | |
| stop | custom_stopping_strings | this is also stuffed with ['\nsystem:', '\nuser:', '\nhuman:', '\nassistant:', '\n###', ] for good measure. | | |
| n | 1 | hardcoded, it may be worth implementing this but I'm not sure how yet | | |
| 1.0 | typical_p | hardcoded | | |
| 1 | num_beams | hardcoded | | |
| max_tokens | max_new_tokens | max_tokens is scaled down by powers of 2 until it's smaller than truncation length. | | |
| logprobs | - | ignored | | |
defaults are mostly from openai, so are different. I use the openai defaults where I can and try to scale them to the webui defaults with the same intent. | |
### Models | |
This has been successfully tested with Koala, Alpaca, gpt4-x-alpaca, GPT4all-snoozy, wizard-vicuna, stable-vicuna and Vicuna 1.1 - ie. Instruction Following models. If you test with other models please let me know how it goes. Less than satisfying results (so far): RWKV-4-Raven, llama, mpt-7b-instruct/chat | |
### Applications | |
Everything needs OPENAI_API_KEY=dummy set. | |
| Compatibility | Application/Library | url | notes / setting | | |
| --- | --- | --- | --- | | |
| β β | openai-python | https://github.com/openai/openai-python | only the endpoints from above are working. OPENAI_API_BASE=http://127.0.0.1:5001/v1 | | |
| β β | openai-node | https://github.com/openai/openai-node | only the endpoints from above are working. environment variables don't work by default, but can be configured (see above) | | |
| β β | chatgpt-api | https://github.com/transitive-bullshit/chatgpt-api | only the endpoints from above are working. environment variables don't work by default, but can be configured (see above) | | |
| β | shell_gpt | https://github.com/TheR1D/shell_gpt | OPENAI_API_HOST=http://127.0.0.1:5001 | | |
| β | gpt-shell | https://github.com/jla/gpt-shell | OPENAI_API_BASE=http://127.0.0.1:5001/v1 | | |
| β | gpt-discord-bot | https://github.com/openai/gpt-discord-bot | OPENAI_API_BASE=http://127.0.0.1:5001/v1 | | |
| β β | langchain | https://github.com/hwchase17/langchain | OPENAI_API_BASE=http://127.0.0.1:5001/v1 even with a good 30B-4bit model the result is poor so far. It assumes zero shot python/json coding. Some model tailored prompt formatting improves results greatly. | | |
| β β | Auto-GPT | https://github.com/Significant-Gravitas/Auto-GPT | OPENAI_API_BASE=http://127.0.0.1:5001/v1 Same issues as langchain. Also assumes a 4k+ context | | |
| β β | babyagi | https://github.com/yoheinakajima/babyagi | OPENAI_API_BASE=http://127.0.0.1:5001/v1 | | |
## Future plans | |
* better error handling | |
* model changing, esp. something for swapping loras or embedding models | |
* consider switching to FastAPI + starlette for SSE (openai SSE seems non-standard) | |
* do something about rate limiting or locking requests for completions, most systems will only be able handle a single request at a time before OOM | |
## Bugs? Feedback? Comments? Pull requests? | |
Are all appreciated, please @matatonic and I'll try to get back to you as soon as possible. | |