Spaces:

akfung
/

phoenix-byte

Sleeping

root commited on Jan 7, 2024

Commit

2c2d40a

1 Parent(s): 7df77f3

switch to docker sdk

Files changed (7) hide show

.gitignore CHANGED Viewed

@@ -37,6 +37,4 @@ tokenizer.model
 *$py.class
 .env
 /embedding_model/
-setup.py
-embedding_setup.sh
 src/db/*.json

 *$py.class
 .env
 /embedding_model/
 src/db/*.json

Dockerfile CHANGED Viewed

@@ -15,7 +15,22 @@ RUN mkdir /code/embedding_model/
 # RUN chmod +x /code/embedding_setup.sh
 RUN python setup.py
-EXPOSE 7860
-# CMD ["python "app.py" "--port", "7860"]
-CMD python app.py --port 7860

 # RUN chmod +x /code/embedding_setup.sh
 RUN python setup.py
+# Set up a new user named "user" with user ID 1000
+RUN useradd -m -u 1000 user
+# Switch to the "user" user
+USER user
+# Set home to the user's home directory
+ENV HOME=/home/user \
+	PATH=/home/user/.local/bin:$PATH
+# Set the working directory to the user's home directory
+WORKDIR $HOME/app
+RUN pip install --no-cache-dir --upgrade pip
+# Copy the current directory contents into the container at $HOME/app setting the owner to the user
+COPY --chown=user . $HOME/app
+EXPOSE 7860
+CMD python app.py

README.md CHANGED Viewed

@@ -2,10 +2,9 @@
 title: Phoenix-Byte
 colorFrom: green
 colorTo: indigo
-sdk: gradio
-sdk_version: 3.39.0
-app_file: app.py
 pinned: false
 ---
 ## Introduction
@@ -19,5 +18,4 @@ Training data for this project was gathered from Justia using the basic requests
 ## Model
 The base model is Meta’s Llama2 7B, chosen because it can be trained on an 8GB consumer GPU with quantization. The model finetuning was performed on a laptop RTX 4060 using 4bit normal float quantization and Low-Rank adapters through the Hugging Face transformers and PEFT libraries. LoRA updates were merged with the model following training completion.
-## Deployment
-This app runs as a gradio app inside a docker container build on Google Cloudbuild and deployed to Compute Engine on a T4 instance. The model weights themselves are stored on Google Cloud storage.

 title: Phoenix-Byte
 colorFrom: green
 colorTo: indigo
+sdk: docker
 pinned: false
+app_port: 7860
 ---
 ## Introduction
 ## Model
 The base model is Meta’s Llama2 7B, chosen because it can be trained on an 8GB consumer GPU with quantization. The model finetuning was performed on a laptop RTX 4060 using 4bit normal float quantization and Low-Rank adapters through the Hugging Face transformers and PEFT libraries. LoRA updates were merged with the model following training completion.

app.py CHANGED Viewed

@@ -33,7 +33,7 @@ def run():
         btn2.click(lambda x: x, inputs=[txt], outputs=cache, queue=False).then(
             model.inference, inputs=[cache, dropdown], outputs=txt2)
-    demo.queue().launch(share=False)
 if __name__=='__main__':
     run()

         btn2.click(lambda x: x, inputs=[txt], outputs=cache, queue=False).then(
             model.inference, inputs=[cache, dropdown], outputs=txt2)
+    demo.queue().launch(share=False, server_name="0.0.0.0")
 if __name__=='__main__':
     run()

setup.py ADDED Viewed

+from sentence_transformers import SentenceTransformer
+embedding_model = SentenceTransformer('multi-qa-mpnet-base-dot-v1')
+embedding_model.save('/embedding_model/')

src/config.py CHANGED Viewed

@@ -47,6 +47,7 @@ headers = {
     "Content-Type": "application/json"
 }
 streaming_url = "https://api.runpod.ai/v2/o4tke61qpopsz0/stream/"
 job_url = "https://api.runpod.ai/v2/o4tke61qpopsz0/run"

     "Content-Type": "application/json"
 }
+embedding_path = os.environ.get('EMBEDDING_PATH')
 streaming_url = "https://api.runpod.ai/v2/o4tke61qpopsz0/stream/"
 job_url = "https://api.runpod.ai/v2/o4tke61qpopsz0/run"

src/model.py CHANGED Viewed

@@ -15,7 +15,7 @@ class Model:
                  max_new_tokens:int=max_new_tokens):
         self.max_new_tokens = max_new_tokens
         # self.embedding_model = SentenceTransformer('multi-qa-mpnet-base-dot-v1')
-        self.embedding_model = SentenceTransformer('/embedding_model/')
     def inference(self, query:str, table:str):

                  max_new_tokens:int=max_new_tokens):
         self.max_new_tokens = max_new_tokens
         # self.embedding_model = SentenceTransformer('multi-qa-mpnet-base-dot-v1')
+        self.embedding_model = SentenceTransformer("/embedding_model/")
     def inference(self, query:str, table:str):