Spaces:
Sleeping
Sleeping
toaster61
commited on
Commit
•
7fd3f9f
1
Parent(s):
1c1f4f1
ggml -> gguf
Browse files- Dockerfile +3 -4
- README.md +2 -2
- app.py +3 -1
- app_gradio.py +45 -0
- requirements.txt +2 -1
- system.prompt +0 -1
- wget-log +6 -0
Dockerfile
CHANGED
@@ -6,7 +6,7 @@ USER root
|
|
6 |
|
7 |
# Installing gcc compiler and main library.
|
8 |
RUN apt update && apt install gcc cmake build-essential -y
|
9 |
-
RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
|
10 |
|
11 |
# Copying files into folder and making it working dir.
|
12 |
RUN mkdir app
|
@@ -16,10 +16,9 @@ WORKDIR /app
|
|
16 |
|
17 |
# Installing wget and downloading model.
|
18 |
RUN apt install wget -y
|
19 |
-
RUN wget -q -O model.bin https://huggingface.co/OpenBuddy
|
20 |
RUN ls
|
21 |
-
# You can use other models!
|
22 |
-
# Or u can comment this two RUNs and include in Space/repo/Docker image own model with name "model.bin".
|
23 |
|
24 |
# Updating pip and installing everything from requirements
|
25 |
RUN python3 -m pip install -U --no-cache-dir pip setuptools wheel
|
|
|
6 |
|
7 |
# Installing gcc compiler and main library.
|
8 |
RUN apt update && apt install gcc cmake build-essential -y
|
9 |
+
RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
|
10 |
|
11 |
# Copying files into folder and making it working dir.
|
12 |
RUN mkdir app
|
|
|
16 |
|
17 |
# Installing wget and downloading model.
|
18 |
RUN apt install wget -y
|
19 |
+
RUN wget -q -O model.bin https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GGUF/blob/main/openbuddy-llama2-13b-v11.1.Q5_K_M.gguf
|
20 |
RUN ls
|
21 |
+
# You can use other models! Or u can comment this two RUNs and include in Space/repo/Docker image own model with name "model.bin".
|
|
|
22 |
|
23 |
# Updating pip and installing everything from requirements
|
24 |
RUN python3 -m pip install -U --no-cache-dir pip setuptools wheel
|
README.md
CHANGED
@@ -4,9 +4,9 @@ emoji: 🏆
|
|
4 |
colorFrom: red
|
5 |
colorTo: indigo
|
6 |
sdk: docker
|
7 |
-
pinned:
|
8 |
---
|
9 |
|
10 |
-
This api built using
|
11 |
|
12 |
For example I used https://huggingface.co/OpenBuddy/openbuddy-openllama-3b-v10-bf16
|
|
|
4 |
colorFrom: red
|
5 |
colorTo: indigo
|
6 |
sdk: docker
|
7 |
+
pinned: true
|
8 |
---
|
9 |
|
10 |
+
This api built using Gradio with queue for openbuddy's models. Also includes Quart and uvicorn setup!
|
11 |
|
12 |
For example I used https://huggingface.co/OpenBuddy/openbuddy-openllama-3b-v10-bf16
|
app.py
CHANGED
@@ -33,4 +33,6 @@ Change <code>`CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS`</code> i
|
|
33 |
Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>, <a href="https://quart.palletsprojects.com/">Quart</a> and <a href="https://www.uvicorn.org/">Uvicorn</a>.<br>
|
34 |
<h1>How to test it on own machine?</h1>
|
35 |
You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
|
36 |
-
Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>
|
|
|
|
|
|
33 |
Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>, <a href="https://quart.palletsprojects.com/">Quart</a> and <a href="https://www.uvicorn.org/">Uvicorn</a>.<br>
|
34 |
<h1>How to test it on own machine?</h1>
|
35 |
You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
|
36 |
+
Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>
|
37 |
+
<br>
|
38 |
+
<h1>Also now it can run with Gradio! Check the repo!</h1>'''
|
app_gradio.py
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
from llama_cpp import Llama
|
3 |
+
|
4 |
+
llm = Llama(model_path="./model.bin")
|
5 |
+
|
6 |
+
with open('system.prompt', 'r', encoding='utf-8') as f:
|
7 |
+
prompt = f.read()
|
8 |
+
|
9 |
+
title = "Openbuddy LLama Api"
|
10 |
+
desc = '''<h1>Hello, world!</h1>
|
11 |
+
This is showcase how to make own server with OpenBuddy's model.<br>
|
12 |
+
I'm using here 3b model just for example. Also here's only CPU power.<br>
|
13 |
+
But you can use GPU power as well!<br><br>
|
14 |
+
<h1>How to GPU?</h1>
|
15 |
+
Change <code>`CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS`</code> in Dockerfile on <code>`CMAKE_ARGS="-DLLAMA_CUBLAS=on"`</code>. Also you can try <code>`DLLAMA_CLBLAST`</code>, <code>`DLLAMA_METAL`</code> or <code>`DLLAMA_METAL`</code>.<br>
|
16 |
+
Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a> and <a href="https://www.gradio.app/">Gradio</a>.<br><br>
|
17 |
+
<h1>How to test it on own machine?</h1>
|
18 |
+
You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
|
19 |
+
Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br><br>
|
20 |
+
Also it can run with quart+uvicorn! Check the repo!'''
|
21 |
+
|
22 |
+
def greet(request: str, max_tokens: int = 64, override_system_prompt: str = ""):
|
23 |
+
try:
|
24 |
+
system_prompt = override_system_prompt if override_system_prompt != "" else prompt
|
25 |
+
max_tokens = max_tokens if max_tokens > 0 and max_tokens < 256 else 64
|
26 |
+
userPrompt = system_prompt + "\n\nUser: " + request + "\nAssistant: "
|
27 |
+
except: return "ERROR 400: Not enough data"
|
28 |
+
try:
|
29 |
+
output = llm(userPrompt, max_tokens=max_tokens, stop=["User:", "\n"], echo=False)
|
30 |
+
print(output)
|
31 |
+
return output["choices"][0]["text"]
|
32 |
+
except Exception as e:
|
33 |
+
print(e)
|
34 |
+
return "ERROR 500: Server error. Check logs!!"
|
35 |
+
|
36 |
+
demo = gr.Interface(
|
37 |
+
fn=greet,
|
38 |
+
inputs=[gr.Text("Hello, how are you?"), gr.Number(64), gr.Textbox()],
|
39 |
+
outputs=["text"],
|
40 |
+
description=desc,
|
41 |
+
title=title,
|
42 |
+
allow_flagging="never"
|
43 |
+
).queue()
|
44 |
+
if __name__ == "__main__":
|
45 |
+
demo.launch()
|
requirements.txt
CHANGED
@@ -1,2 +1,3 @@
|
|
1 |
quart
|
2 |
-
uvicorn
|
|
|
|
1 |
quart
|
2 |
+
uvicorn
|
3 |
+
gradio
|
system.prompt
CHANGED
@@ -1 +0,0 @@
|
|
1 |
-
Prompt: Отвечай максимально кратко и по делу.
|
|
|
|
wget-log
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
--2023-09-30 17:29:11-- https://cdn-lfs.huggingface.co/repos/c8/66/c866ea7f0aa48d9e6cd5d10064562a36f8b43f272e5508eceac84d411b157f32/557b305cd42ca3da588a0e2f16dc1aceedcc73232fc2174da311428f16f0ca9e?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27openbuddy-llama2-13b-v8.1-q3_K.bin%3B+filename%3D%22openbuddy-llama2-13b-v8.1-q3_K.bin%22%3B
|
2 |
+
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 108.156.22.122, 108.156.22.7, 108.156.22.58, ...
|
3 |
+
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|108.156.22.122|:443... connected.
|
4 |
+
HTTP request sent, awaiting response... 403 Forbidden
|
5 |
+
2023-09-30 17:29:11 ERROR 403: Forbidden.
|
6 |
+
|