toaster61 commited on
Commit
d1343e4
1 Parent(s): 773e76d

if it will work this is last commit using quart

Browse files

i want to move to gradio, it can do queue, i dont wanna do db to make queue, its useless

Files changed (4) hide show
  1. Dockerfile +4 -0
  2. app.py +25 -8
  3. requirements.txt +4 -1
  4. system.prompt +1 -1
Dockerfile CHANGED
@@ -14,6 +14,10 @@ COPY . /app
14
  RUN chmod -R 777 /app
15
  WORKDIR /app
16
 
 
 
 
 
17
  # Installing wget and downloading model.
18
  RUN apt install wget -y
19
  RUN wget -q -O model.bin https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q5_K_M.gguf
 
14
  RUN chmod -R 777 /app
15
  WORKDIR /app
16
 
17
+ # Making dir for translator model (facebook/m2m100_1.2B)
18
+ RUN mkdir translator
19
+ RUN chmod -R 777 /translator
20
+
21
  # Installing wget and downloading model.
22
  RUN apt install wget -y
23
  RUN wget -q -O model.bin https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q5_K_M.gguf
app.py CHANGED
@@ -1,25 +1,44 @@
 
 
1
  from quart import Quart, request
2
  from llama_cpp import Llama
3
 
4
- app = Quart(__name__)
5
- llm = Llama(model_path="./model.bin")
 
 
 
 
 
 
 
 
6
 
 
 
 
 
7
  with open('system.prompt', 'r', encoding='utf-8') as f:
8
  prompt = f.read()
9
 
 
10
  @app.post("/request")
11
  async def echo():
12
  try:
13
  data = await request.get_json()
14
  maxTokens = data.get("max_tokens", 64)
15
- userPrompt = prompt + "\n\nUser: " + data['request'] + "\nAssistant: "
16
- except: return {"error": "Not enough data"}, 400
 
 
 
 
17
  try:
18
  output = llm(userPrompt, max_tokens=maxTokens, stop=["User:", "\n"], echo=False)
19
  return {"output": output["choices"][0]["text"]}
20
  except Exception as e:
21
  print(e)
22
- return {"error": "Server error"}, 500
23
 
24
  @app.get("/")
25
  async def get():
@@ -35,6 +54,4 @@ Powered by <a href="https://github.com/abetlen/llama-cpp-python">llama-cpp-pytho
35
  You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
36
  Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>
37
  <br>
38
- <h1>Also now it can run with Gradio! Check the repo!</h1>
39
- <br>
40
- <script>document.write("URL of space: "+window.location.href);</script>'''
 
1
+ # Importing libraries
2
+ from transformers import M2M100Tokenizer, M2M100ForConditionalGeneration
3
  from quart import Quart, request
4
  from llama_cpp import Llama
5
 
6
+ # Initing things
7
+ app = Quart(__name__) # Quart app
8
+ llm = Llama(model_path="./model.bin") # LLaMa model
9
+ tokenizer = M2M100Tokenizer.from_pretrained( # tokenizer for translator
10
+ "facebook/m2m100_1.2B", cache_dir="translator/"
11
+ )
12
+ model = M2M100ForConditionalGeneration.from_pretrained( # translator model
13
+ "facebook/m2m100_1.2B", cache_dir="translator/"
14
+ )
15
+ model.eval()
16
 
17
+ # Preparing things to work
18
+ tokenizer.src_lang = "en"
19
+
20
+ # Loading prompt
21
  with open('system.prompt', 'r', encoding='utf-8') as f:
22
  prompt = f.read()
23
 
24
+ # Defining
25
  @app.post("/request")
26
  async def echo():
27
  try:
28
  data = await request.get_json()
29
  maxTokens = data.get("max_tokens", 64)
30
+ if isinstance(data.get("system_prompt"), str):
31
+ userPrompt = data.get("system_prompt") + "\n\nUser: " + data['request'] + "\nAssistant: "
32
+ else:
33
+ userPrompt = prompt + "\n\nUser: " + data['request'] + "\nAssistant: "
34
+ except:
35
+ return {"error": "Not enough data", "output": "Oops! Error occured! If you're a developer, using this API, check 'error' key."}, 400
36
  try:
37
  output = llm(userPrompt, max_tokens=maxTokens, stop=["User:", "\n"], echo=False)
38
  return {"output": output["choices"][0]["text"]}
39
  except Exception as e:
40
  print(e)
41
+ return {"error": str(e), "output": "Oops! Internal server error. Check the logs. If you're a developer, using this API, check 'error' key."}, 500
42
 
43
  @app.get("/")
44
  async def get():
 
54
  You can install Docker, build image and run it. I made <code>`run-docker.sh`</code> for ya. To stop container run <code>`docker ps`</code>, find name of container and run <code>`docker stop _dockerContainerName_`</code><br>
55
  Or you can once follow steps in Dockerfile and try it on your machine, not in Docker.<br>
56
  <br>
57
+ <script>document.write("<b>URL of space:</b> "+window.location.href);</script>'''
 
 
requirements.txt CHANGED
@@ -1,3 +1,6 @@
1
  Werkzeug==2.3.7
2
  quart
3
- uvicorn
 
 
 
 
1
  Werkzeug==2.3.7
2
  quart
3
+ uvicorn
4
+ torch
5
+ transformers
6
+ transformers[sentencepiece]
system.prompt CHANGED
@@ -1,4 +1,4 @@
1
- You're an AI assistant named Alisa. You're friendly and respectful.
2
  You speak as briefly, clearly and to the point as possible.
3
  You know many languages, for example: Russian, English.
4
  You don't have access to the internet, so rely on your knowledge.
 
1
+ You're an AI assistant named Alex. You're friendly and respectful.
2
  You speak as briefly, clearly and to the point as possible.
3
  You know many languages, for example: Russian, English.
4
  You don't have access to the internet, so rely on your knowledge.