GGUF_API

Runtime error

App Files Files Community

0x81632

limcheekin commited on Sep 9, 2023

Commit

4ba83ab

•

0 Parent(s):

Duplicate from limcheekin/orca_mini_v3_7B-GGUF

Browse files

Co-authored-by: Lim Chee Kin <limcheekin@users.noreply.huggingface.co>

Files changed (7) hide show

.gitattributes +35 -0
Dockerfile +35 -0
LICENSE +21 -0
README.md +21 -0
index.html +37 -0
main.py +27 -0
start_server.sh +6 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

Dockerfile ADDED Viewed

	@@ -0,0 +1,35 @@

+# Grab a fresh copy of the Python image
+FROM python:3.10-slim
+# Install build and runtime dependencies
+RUN apt-get update && \
+    apt-get install -y \
+    libopenblas-dev \
+    ninja-build \
+    build-essential \
+    pkg-config \
+    curl
+RUN pip install -U pip setuptools wheel && \
+    CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" FORCE_CMAKE=1 pip install --verbose llama-cpp-python[server]
+# Download model
+RUN mkdir model && \
+    curl -L https://huggingface.co/TheBloke/orca_mini_v3_7B-GGUF/resolve/main/orca_mini_v3_7b.Q4_K_M.gguf -o model/gguf-model.bin
+COPY ./start_server.sh ./
+COPY ./main.py ./
+COPY ./index.html ./
+# Make the server start script executable
+RUN chmod +x ./start_server.sh
+# Set environment variable for the host
+ENV HOST=0.0.0.0
+ENV PORT=7860
+# Expose a port for the server
+EXPOSE ${PORT}
+# Run the server start script
+CMD ["/bin/sh", "./start_server.sh"]

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Lim Chee Kin
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,21 @@

+---
+title: orca_mini_v3_7B-GGUF (Q4_K_M)
+colorFrom: purple
+colorTo: blue
+sdk: docker
+models:
+- psmathur/orca_mini_v3_7b
+- TheBloke/orca_mini_v3_7B-GGUF
+tags:
+- inference api
+- openai-api compatible
+- llama-cpp-python
+- orca_mini_v3_7B
+- gguf
+pinned: false
+duplicated_from: limcheekin/orca_mini_v3_7B-GGUF
+---
+# orca_mini_v3_7B-GGUF (Q4_K_M)
+Please refer to the [index.html](index.html) for more information.

index.html ADDED Viewed

	@@ -0,0 +1,37 @@

+<!DOCTYPE html>
+<html>
+  <head>
+    <title>orca_mini_v3_7B-GGUF (Q4_K_M)</title>
+  </head>
+  <body>
+    <h1>orca_mini_v3_7B-GGUF (Q4_K_M)</h1>
+    <p>
+      With the utilization of the
+      <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>
+      package, we are excited to introduce the GGUF model hosted in the Hugging
+      Face Docker Spaces, made accessible through an OpenAI-compatible API. This
+      space includes comprehensive API documentation to facilitate seamless
+      integration.
+    </p>
+    <ul>
+      <li>
+        The API endpoint:
+        <a href="https://limcheekin-orca-mini-v3-7b-gguf.hf.space/v1"
+          >https://limcheekin-orca-mini-v3-7b-gguf.hf.space/v1</a
+        >
+      </li>
+      <li>
+        The API doc:
+        <a href="https://limcheekin-orca-mini-v3-7b-gguf.hf.space/docs"
+          >https://limcheekin-orca-mini-v3-7b-gguf.hf.space/docs</a
+        >
+      </li>
+    </ul>
+    <p>
+      If you find this resource valuable, your support in the form of starring
+      the space would be greatly appreciated. Your engagement plays a vital role
+      in furthering the application for a community GPU grant, ultimately
+      enhancing the capabilities and accessibility of this space.
+    </p>
+  </body>
+</html>

main.py ADDED Viewed

	@@ -0,0 +1,27 @@

+from llama_cpp.server.app import create_app, Settings
+from fastapi.responses import HTMLResponse
+import os
+app = create_app(
+    Settings(
+        n_threads=2,  # set to number of cpu cores
+        model="model/gguf-model.bin",
+        embedding=False
+    )
+)
+# Read the content of index.html once and store it in memory
+with open("index.html", "r") as f:
+    content = f.read()
+@app.get("/", response_class=HTMLResponse)
+async def read_items():
+    return content
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app,
+                host=os.environ["HOST"],
+                port=int(os.environ["PORT"])
+                )

start_server.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/sh
+# For mlock support
+ulimit -l unlimited
+python3 -B main.py