toaster61 commited on
Commit
7fd3f9f
1 Parent(s): 1c1f4f1

ggml -> gguf

Browse files
Files changed (7) hide show
  1. Dockerfile +3 -4
  2. README.md +2 -2
  3. app.py +3 -1
  4. app_gradio.py +45 -0
  5. requirements.txt +2 -1
  6. system.prompt +0 -1
  7. wget-log +6 -0
Dockerfile CHANGED
@@ -6,7 +6,7 @@ USER root
6
 
7
  # Installing gcc compiler and main library.
8
  RUN apt update && apt install gcc cmake build-essential -y
9
- RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python==0.1.78
10
 
11
  # Copying files into folder and making it working dir.
12
  RUN mkdir app
@@ -16,10 +16,9 @@ WORKDIR /app
16
 
17
  # Installing wget and downloading model.
18
  RUN apt install wget -y
19
- RUN wget -q -O model.bin https://huggingface.co/OpenBuddy/openbuddy-ggml/resolve/main/openbuddy-openllama-3b-v10-q5_0.bin
20
  RUN ls
21
- # You can use other models! Visit https://huggingface.co/OpenBuddy/openbuddy-ggml and choose model that u like!
22
- # Or u can comment this two RUNs and include in Space/repo/Docker image own model with name "model.bin".
23
 
24
  # Updating pip and installing everything from requirements
25
  RUN python3 -m pip install -U --no-cache-dir pip setuptools wheel
 
6
 
7
  # Installing gcc compiler and main library.
8
  RUN apt update && apt install gcc cmake build-essential -y
9
+ RUN CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
10
 
11
  # Copying files into folder and making it working dir.
12
  RUN mkdir app
 
16
 
17
  # Installing wget and downloading model.
18
  RUN apt install wget -y
19
+ RUN wget -q -O model.bin https://huggingface.co/TheBloke/OpenBuddy-Llama2-13B-v11.1-GGUF/blob/main/openbuddy-llama2-13b-v11.1.Q5_K_M.gguf
20
  RUN ls
21
+ # You can use other models! Or u can comment this two RUNs and include in Space/repo/Docker image own model with name "model.bin".
 
22
 
23
  # Updating pip and installing everything from requirements
24
  RUN python3 -m pip install -U --no-cache-dir pip setuptools wheel
README.md CHANGED
@@ -4,9 +4,9 @@ emoji: 🏆
4
  colorFrom: red
5
  colorTo: indigo
6
  sdk: docker
7
- pinned: false
8
  ---
9
 
10
- This api built using Quart, uvicorn for openbuddy's models.
11
 
12
  For example I used https://huggingface.co/OpenBuddy/openbuddy-openllama-3b-v10-bf16
 
4
  colorFrom: red
5
  colorTo: indigo
6
  sdk: docker
7
+ pinned: true
8
  ---
9
 
10
+ This api built using Gradio with queue for openbuddy's models. Also includes Quart and uvicorn setup!
11
 
12
  For example I used https://huggingface.co/OpenBuddy/openbuddy-openllama-3b-v10-bf16
app.py CHANGED
@@ -33,4 +33,6 @@ Change <code>`CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS`</code> i
33
  Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>, <a href="https://quart.palletsprojects.com/">Quart</a> and <a href="https://www.uvicorn.org/">Uvicorn</a>.<br>
34
  <h1>How to test it on own machine?</h1>
35
  You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
36
- Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>'''
 
 
 
33
  Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a>, <a href="https://quart.palletsprojects.com/">Quart</a> and <a href="https://www.uvicorn.org/">Uvicorn</a>.<br>
34
  <h1>How to test it on own machine?</h1>
35
  You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
36
+ Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>
37
+ <br>
38
+ <h1>Also now it can run with Gradio! Check the repo!</h1>'''
app_gradio.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from llama_cpp import Llama
3
+
4
+ llm = Llama(model_path="./model.bin")
5
+
6
+ with open('system.prompt', 'r', encoding='utf-8') as f:
7
+ prompt = f.read()
8
+
9
+ title = "Openbuddy LLama Api"
10
+ desc = '''<h1>Hello, world!</h1>
11
+ This is showcase how to make own server with OpenBuddy's model.<br>
12
+ I'm using here 3b model just for example. Also here's only CPU power.<br>
13
+ But you can use GPU power as well!<br><br>
14
+ <h1>How to GPU?</h1>
15
+ Change <code>`CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS`</code> in Dockerfile on <code>`CMAKE_ARGS="-DLLAMA_CUBLAS=on"`</code>. Also you can try <code>`DLLAMA_CLBLAST`</code>, <code>`DLLAMA_METAL`</code> or <code>`DLLAMA_METAL`</code>.<br>
16
+ Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-python</a> and <a href="https://www.gradio.app/">Gradio</a>.<br><br>
17
+ <h1>How to test it on own machine?</h1>
18
+ You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
19
+ Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br><br>
20
+ Also it can run with quart+uvicorn! Check the repo!'''
21
+
22
+ def greet(request: str, max_tokens: int = 64, override_system_prompt: str = ""):
23
+ try:
24
+ system_prompt = override_system_prompt if override_system_prompt != "" else prompt
25
+ max_tokens = max_tokens if max_tokens > 0 and max_tokens < 256 else 64
26
+ userPrompt = system_prompt + "\n\nUser: " + request + "\nAssistant: "
27
+ except: return "ERROR 400: Not enough data"
28
+ try:
29
+ output = llm(userPrompt, max_tokens=max_tokens, stop=["User:", "\n"], echo=False)
30
+ print(output)
31
+ return output["choices"][0]["text"]
32
+ except Exception as e:
33
+ print(e)
34
+ return "ERROR 500: Server error. Check logs!!"
35
+
36
+ demo = gr.Interface(
37
+ fn=greet,
38
+ inputs=[gr.Text("Hello, how are you?"), gr.Number(64), gr.Textbox()],
39
+ outputs=["text"],
40
+ description=desc,
41
+ title=title,
42
+ allow_flagging="never"
43
+ ).queue()
44
+ if __name__ == "__main__":
45
+ demo.launch()
requirements.txt CHANGED
@@ -1,2 +1,3 @@
1
  quart
2
- uvicorn
 
 
1
  quart
2
+ uvicorn
3
+ gradio
system.prompt CHANGED
@@ -1 +0,0 @@
1
- Prompt: Отвечай максимально кратко и по делу.
 
 
wget-log ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ --2023-09-30 17:29:11-- https://cdn-lfs.huggingface.co/repos/c8/66/c866ea7f0aa48d9e6cd5d10064562a36f8b43f272e5508eceac84d411b157f32/557b305cd42ca3da588a0e2f16dc1aceedcc73232fc2174da311428f16f0ca9e?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27openbuddy-llama2-13b-v8.1-q3_K.bin%3B+filename%3D%22openbuddy-llama2-13b-v8.1-q3_K.bin%22%3B
2
+ Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 108.156.22.122, 108.156.22.7, 108.156.22.58, ...
3
+ Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|108.156.22.122|:443... connected.
4
+ HTTP request sent, awaiting response... 403 Forbidden
5
+ 2023-09-30 17:29:11 ERROR 403: Forbidden.
6
+