Spaces:

Pro100Sata
/

xche_ai

Sleeping

App Files Files Community

Yarik commited on May 25, 2024

Commit

348823d

1 Parent(s): 083d21c

Update space

Browse files

Files changed (1) hide show

README.md +155 -1

README.md CHANGED Viewed

@@ -8,4 +8,158 @@ pinned: false
 license: apache-2.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: apache-2.0
 ---
+## HF-LLM-API
+Huggingface LLM Inference API in OpenAI message format.
+## Features
+- Available Models (2024/01/22):
+  - `mistral-7b`, `mixtral-8x7b`, `nous-mixtral-8x7b`
+  - Adaptive prompt templates for different models
+- Support OpenAI API format
+  - Enable api endpoint via official `openai-python` package
+- Support both stream and no-stream response
+- Support API Key via both HTTP auth header and env varible
+- Docker deployment
+## Run API service
+### Run in Command Line
+**Install dependencies:**
+```bash
+# pipreqs . --force --mode no-pin
+pip install -r requirements.txt
+```
+**Run API:**
+```bash
+python -m apis.chat_api
+```
+## Run via Docker
+**Docker build:**
+```bash
+sudo docker build -t hf-llm-api:1.0 . --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy
+```
+**Docker run:**
+```bash
+# no proxy
+sudo docker run -p 23333:23333 hf-llm-api:1.0
+# with proxy
+sudo docker run -p 23333:23333 --env http_proxy="http://<server>:<port>" hf-llm-api:1.0
+```
+## API Usage
+### Using `openai-python`
+See: [`examples/chat_with_openai.py`](https://github.com/ruslanmv/hf-llm-api-collection/blob/main/examples/chat_with_openai.py)
+```py
+from openai import OpenAI
+# If runnning this service with proxy, you might need to unset `http(s)_proxy`.
+base_url = "http://127.0.0.1:23333"
+# Your own HF_TOKEN
+api_key = "hf_xxxxxxxxxxxxxxxx"
+# use below as non-auth user
+# api_key = "sk-xxx"
+client = OpenAI(base_url=base_url, api_key=api_key)
+response = client.chat.completions.create(
+    model="mixtral-8x7b",
+    messages=[
+        {
+            "role": "user",
+            "content": "what is your model",
+        }
+    ],
+    stream=True,
+)
+for chunk in response:
+    if chunk.choices[0].delta.content is not None:
+        print(chunk.choices[0].delta.content, end="", flush=True)
+    elif chunk.choices[0].finish_reason == "stop":
+        print()
+    else:
+        pass
+```
+### Using post requests
+See: [`examples/chat_with_post.py`](https://github.com/ruslanmv/hf-llm-api-collection/blob/main/examples/chat_with_post.py)
+```py
+import ast
+import httpx
+import json
+import re
+# If runnning this service with proxy, you might need to unset `http(s)_proxy`.
+chat_api = "http://127.0.0.1:23333"
+# Your own HF_TOKEN
+api_key = "hf_xxxxxxxxxxxxxxxx"
+# use below as non-auth user
+# api_key = "sk-xxx"
+requests_headers = {}
+requests_payload = {
+    "model": "mixtral-8x7b",
+    "messages": [
+        {
+            "role": "user",
+            "content": "what is your model",
+        }
+    ],
+    "stream": True,
+}
+with httpx.stream(
+    "POST",
+    chat_api + "/chat/completions",
+    headers=requests_headers,
+    json=requests_payload,
+    timeout=httpx.Timeout(connect=20, read=60, write=20, pool=None),
+) as response:
+    # https://docs.aiohttp.org/en/stable/streams.html
+    # https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb
+    response_content = ""
+    for line in response.iter_lines():
+        remove_patterns = [r"^\s*data:\s*", r"^\s*\[DONE\]\s*"]
+        for pattern in remove_patterns:
+            line = re.sub(pattern, "", line).strip()
+        if line:
+            try:
+                line_data = json.loads(line)
+            except Exception as e:
+                try:
+                    line_data = ast.literal_eval(line)
+                except:
+                    print(f"Error: {line}")
+                    raise e
+            # print(f"line: {line_data}")
+            delta_data = line_data["choices"][0]["delta"]
+            finish_reason = line_data["choices"][0]["finish_reason"]
+            if "role" in delta_data:
+                role = delta_data["role"]
+            if "content" in delta_data:
+                delta_content = delta_data["content"]
+                response_content += delta_content
+                print(delta_content, end="", flush=True)
+            if finish_reason == "stop":
+                print()
+```