Spaces:

imperialwool
/

llama-cpp-api

Sleeping

App Files Files Community

toaster61 commited on Oct 3, 2023

Commit

d1343e4

1 Parent(s): 773e76d

if it will work this is last commit using quart

Browse files

i want to move to gradio, it can do queue, i dont wanna do db to make queue, its useless

Files changed (4) hide show

Dockerfile +4 -0
app.py +25 -8
requirements.txt +4 -1
system.prompt +1 -1

Dockerfile CHANGED Viewed

@@ -14,6 +14,10 @@ COPY . /app
 RUN chmod -R 777 /app
 WORKDIR /app
 # Installing wget and downloading model.
 RUN apt install wget -y
 RUN wget -q -O model.bin https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q5_K_M.gguf

 RUN chmod -R 777 /app
 WORKDIR /app
+# Making dir for translator model (facebook/m2m100_1.2B)
+RUN mkdir translator
+RUN chmod -R 777 /translator
 # Installing wget and downloading model.
 RUN apt install wget -y
 RUN wget -q -O model.bin https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q5_K_M.gguf

app.py CHANGED Viewed

@@ -1,25 +1,44 @@
 from quart import Quart, request
 from llama_cpp import Llama
-app = Quart(__name__)
-llm = Llama(model_path="./model.bin")
 with open('system.prompt', 'r', encoding='utf-8') as f:
     prompt = f.read()
 @app.post("/request")
 async def echo():
     try:
         data = await request.get_json()
         maxTokens = data.get("max_tokens", 64)
-        userPrompt = prompt + "\n\nUser: " + data['request'] + "\nAssistant: "
-    except: return {"error": "Not enough data"}, 400
     try:
         output = llm(userPrompt, max_tokens=maxTokens, stop=["User:", "\n"], echo=False)
         return {"output": output["choices"][0]["text"]}
     except Exception as e:
         print(e)
-        return {"error": "Server error"}, 500
 @app.get("/")
 async def get():
@@ -35,6 +54,4 @@ Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-pytho
 You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
 Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>
 <br>
-<h1>Also now it can run with Gradio! Check the repo!</h1>
-<br>
-<script>document.write("URL of space: "+window.location.href);</script>'''

+# Importing libraries
+from transformers import M2M100Tokenizer, M2M100ForConditionalGeneration
 from quart import Quart, request
 from llama_cpp import Llama
+# Initing things
+app = Quart(__name__)                                   # Quart app
+llm = Llama(model_path="./model.bin")                   # LLaMa model
+tokenizer = M2M100Tokenizer.from_pretrained(            # tokenizer for translator
+    "facebook/m2m100_1.2B", cache_dir="translator/"
+)
+model = M2M100ForConditionalGeneration.from_pretrained( # translator model
+    "facebook/m2m100_1.2B", cache_dir="translator/"
+)
+model.eval()
+# Preparing things to work
+tokenizer.src_lang = "en"
+# Loading prompt
 with open('system.prompt', 'r', encoding='utf-8') as f:
     prompt = f.read()
+# Defining
 @app.post("/request")
 async def echo():
     try:
         data = await request.get_json()
         maxTokens = data.get("max_tokens", 64)
+        if isinstance(data.get("system_prompt"), str):
+            userPrompt = data.get("system_prompt") + "\n\nUser: " + data['request'] + "\nAssistant: "
+        else:
+            userPrompt = prompt + "\n\nUser: " + data['request'] + "\nAssistant: "
+    except:
+        return {"error": "Not enough data", "output": "Oops! Error occured! If you're a developer, using this API, check 'error' key."}, 400
     try:
         output = llm(userPrompt, max_tokens=maxTokens, stop=["User:", "\n"], echo=False)
         return {"output": output["choices"][0]["text"]}
     except Exception as e:
         print(e)
+        return {"error": str(e), "output": "Oops! Internal server error. Check the logs. If you're a developer, using this API, check 'error' key."}, 500
 @app.get("/")
 async def get():
 You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
 Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>
 <br>
+<script>document.write("<b>URL of space:</b> "+window.location.href);</script>'''

requirements.txt CHANGED Viewed

@@ -1,3 +1,6 @@
 Werkzeug==2.3.7
 quart
-uvicorn

 Werkzeug==2.3.7
 quart
+uvicorn
+torch
+transformers
+transformers[sentencepiece]

system.prompt CHANGED Viewed

@@ -1,4 +1,4 @@
-You're an AI assistant named Alisa. You're friendly and respectful.
 You speak as briefly, clearly and to the point as possible.
 You know many languages, for example: Russian, English.
 You don't have access to the internet, so rely on your knowledge.

+You're an AI assistant named Alex. You're friendly and respectful.
 You speak as briefly, clearly and to the point as possible.
 You know many languages, for example: Russian, English.
 You don't have access to the internet, so rely on your knowledge.