:pencil: [Doc] Readme: New models, api key and no-stream mode, and models to support
Browse files
README.md
CHANGED
@@ -14,15 +14,21 @@ Huggingface LLM Inference API in OpenAI message format.
|
|
14 |
|
15 |
✅ Implemented:
|
16 |
|
17 |
-
-
|
18 |
-
- `mixtral-8x7b`, `mistral-7b`
|
|
|
19 |
- Support OpenAI API format
|
20 |
- Can use api endpoint via official `openai-python` package
|
21 |
-
-
|
|
|
22 |
- Docker deployment
|
23 |
|
24 |
🔨 In progress:
|
25 |
-
- [
|
|
|
|
|
|
|
|
|
26 |
|
27 |
## Run API service
|
28 |
|
@@ -70,7 +76,8 @@ from openai import OpenAI
|
|
70 |
|
71 |
# If runnning this service with proxy, you might need to unset `http(s)_proxy`.
|
72 |
base_url = "http://127.0.0.1:23333"
|
73 |
-
|
|
|
74 |
|
75 |
client = OpenAI(base_url=base_url, api_key=api_key)
|
76 |
response = client.chat.completions.create(
|
|
|
14 |
|
15 |
✅ Implemented:
|
16 |
|
17 |
+
- Available Models:
|
18 |
+
- `mixtral-8x7b`, `mistral-7b`, `openchat-3.5`
|
19 |
+
- Adaptive prompt templates for different models
|
20 |
- Support OpenAI API format
|
21 |
- Can use api endpoint via official `openai-python` package
|
22 |
+
- Support both stream and no-stream response
|
23 |
+
- Support API Key via both HTTP auth header and env varible (https://github.com/Hansimov/hf-llm-api/issues/4)
|
24 |
- Docker deployment
|
25 |
|
26 |
🔨 In progress:
|
27 |
+
- [ ] Support more models (https://github.com/Hansimov/hf-llm-api/issues/5)
|
28 |
+
- [ ] meta-llama/Llama-2-70b-chat-hf
|
29 |
+
- [ ] codellama/CodeLlama-34b-Instruct-hf
|
30 |
+
- [ ] tiiuae/falcon-180B-chat
|
31 |
+
|
32 |
|
33 |
## Run API service
|
34 |
|
|
|
76 |
|
77 |
# If runnning this service with proxy, you might need to unset `http(s)_proxy`.
|
78 |
base_url = "http://127.0.0.1:23333"
|
79 |
+
# Your own HF_TOKEN
|
80 |
+
api_key = "hf_xxxxxxxxxxxxxxxx"
|
81 |
|
82 |
client = OpenAI(base_url=base_url, api_key=api_key)
|
83 |
response = client.chat.completions.create(
|