Yang Ouyang commited on
Commit
2fc6f7f
1 Parent(s): c213803
Files changed (5) hide show
  1. README copy.md +118 -0
  2. app.py +68 -0
  3. image-1.png +0 -0
  4. rag.py +65 -0
  5. requirements.txt +9 -0
README copy.md ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # mini_project9_streamlit_llm
2
+
3
+ Build a ChatPDF Retrieval Augmented Generation application with LLM model from HuggingFace and UI built with Streamlit.
4
+
5
+ > With the rise of Large Language Models and their impressive capabilities, many fancy applications are being built on top of giant LLM providers like OpenAI and Anthropic. The myth behind such applications is the RAG framework, which has been thoroughly explained in the references.
6
+
7
+ ## Prerequisites
8
+ Dependencies:
9
+ - langchain
10
+ - streamlit
11
+ - streamlit-chat
12
+ - pypdf
13
+ - chromadb
14
+ - fastembed
15
+
16
+ ```bash
17
+ pip install langchain streamlit streamlit_chat chromadb pypdf fastembed
18
+ ```
19
+
20
+ How to Build Your Own RAG: Langchain + HuggingFace + Streamlit
21
+
22
+
23
+ We will build an application that is something similar to [ChatPDF](https://www.chatpdf.com/) but simpler. Where users can upload a PDF document and ask questions through a straightforward UI. Our tech stack is super easy with Langchain, HuggingFace, and Streamlit.
24
+
25
+
26
+ * LLM Server: The most critical component of this app is the LLM server. Thanks to [HuggingFace](https://huggingface.co/), we can easily access the latest models and deploy them on our local machine. For this project, we’ll be using the Mistral model from HuggingFace, which is a retrieval-augmented generation model. The Mistral model is a powerful model that can generate text based on the context provided to it. It’s a great choice for our application.
27
+
28
+ * RAG: Undoubtedly, the two leading libraries in the LLM domain are [Langchain](https://python.langchain.com/docs/get_started/introduction) and [LLamIndex](https://www.llamaindex.ai/). For this project, I’ll be using Langchain due to my familiarity with it from my professional experience. An essential component of any RAG framework is vector storage. We’ll be using [Chroma](https://github.com/chroma-core/chroma) here, as it integrates well with Langchain.
29
+
30
+
31
+ * Chat UI: The user interface is also an important component. Although there are many technologies available, I prefer using [Streamlit](https://streamlit.io), a Python library, for peace of mind.
32
+
33
+ ## Setup HuggingFace Model
34
+ First things first, we need to set up the LLM server. Here’s how you can do it:
35
+ 1. Create a token on HuggingFace.
36
+ 2. Use the following code to use the newest Mistral model from HuggingFace which has much better performance than the v0.1 model.
37
+ ```python
38
+ repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
39
+ self.model = HuggingFaceHub(huggingfacehub_api_token='hf_xxxxxxxxxx',
40
+ repo_id=repo_id, model_kwargs={"temperature":0.8, "max_new_tokens":100})
41
+ ```
42
+
43
+ ## Build the RAG Pipeline
44
+ The second step in our process is to build the RAG pipeline.
45
+ 1. Given the simplicity of our application, we primarily need two methods: ```ingest``` and ```ask```.
46
+ ```python
47
+ def ingest(self, pdf_file_path: str):
48
+ docs = PyPDFLoader(file_path=pdf_file_path).load()
49
+ chunks = self.text_splitter.split_documents(docs)
50
+ chunks = filter_complex_metadata(chunks)
51
+
52
+ vector_store = Chroma.from_documents(documents=chunks, embedding=FastEmbedEmbeddings())
53
+ self.retriever = vector_store.as_retriever(
54
+ search_type="similarity_score_threshold",
55
+ search_kwargs={
56
+ "k": 3,
57
+ "score_threshold": 0.5,
58
+ },
59
+ )
60
+
61
+ self.chain = ({"context": self.retriever, "question": RunnablePassthrough()}
62
+ | self.prompt
63
+ | self.model
64
+ | StrOutputParser())
65
+
66
+ def ask(self, query: str):
67
+ if not self.chain:
68
+ return "Please, add a PDF document first."
69
+
70
+ return self.chain.invoke(query)
71
+ ```
72
+
73
+ The ```ingest``` method accepts a file path and loads it into vector storage in two steps: first, it splits the document into smaller chunks to accommodate the token limit of the LLM; second, it vectorizes these chunks using Qdrant FastEmbeddings and stores them into Chroma.
74
+
75
+
76
+ The ```ask``` method handles user queries. Users can pose a question, and then the RetrievalQAChain retrieves the relevant contexts (document chunks) using vector similarity search techniques.
77
+
78
+
79
+ 2. With the user's question and the retrieved contexts, we can compose a prompt and request a prediction from the LLM server.
80
+
81
+
82
+ ```python
83
+ prompt_text = """
84
+ <s> [INST] You are an assistant for question-answering tasks. Use the following pieces of retrieved context
85
+ to answer the question. If you don't know the answer, just say that you don't know. Use three sentences
86
+ maximum and keep the answer concise. [/INST] </s>
87
+ [INST] Question: {question}
88
+ Context: {context}
89
+ Answer: [/INST]
90
+ """
91
+ ```
92
+
93
+ The prompt is sourced from the Langchain hub: [Langchain RAG Prompt for Mistral](https://smith.langchain.com/hub/rlm/rag-prompt-mistral). This prompt has been tested and downloaded thousands of times, serving as a reliable resource for learning about LLM prompting techniques.
94
+
95
+
96
+ You can learn more about LLM prompting techniques [here](https://www.promptingguide.ai/).
97
+
98
+ ## Draft A Simple UI
99
+
100
+ For a simple user interface, we will use [Streamlit](https://streamlit.io/), a UI framework designed for the fast prototyping of AI/ML applications.
101
+
102
+ 1. PDF Document Upload: Users can upload one or more PDF documents which the application will ingest for processing.
103
+ 2. Interactive Chat: The application supports a chat interface where users can ask questions or make queries. The system processes these inputs and provides responses based on the content of the uploaded PDF documents.
104
+ 3. Dynamic Interface Elements: Uses Streamlit's dynamic interface elements to manage the chat interface, display messages, and provide feedback during processing (e.g., spinners).
105
+
106
+ Run this code with the command ```streamlit run app.py``` to see what it looks like.
107
+
108
+ Reference from Blog post: https://blog.duy-huynh.com/build-your-own-rag-and-run-them-locally/
109
+
110
+ ## Deployment
111
+ It is quite easy to deploy a Streamlit application on Streamlit Cloud following the instructions [here](https://docs.streamlit.io/library/deploying/deploying-with-streamlit-sharing).
112
+
113
+ 1. Create a free account on Streamlit Cloud.
114
+ 2. Install the Streamlit CLI.
115
+ 3. Deploy the application using the 'Deploy' button on the local website.
116
+ 4. Prepare the 'requirements.txt' file with the necessary dependencies using the command ```pip freeze > requirements.txt```.
117
+ 4. Set up the environment variables for the HuggingFace token.
118
+ ![alt text](image-1.png)
app.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/env python3
2
+ import os
3
+ import tempfile
4
+ import streamlit as st
5
+ from streamlit_chat import message
6
+ from rag import ChatPDF
7
+
8
+ st.set_page_config(page_title="ChatPDF")
9
+
10
+
11
+ def display_messages():
12
+ st.subheader("Chat")
13
+ for i, (msg, is_user) in enumerate(st.session_state["messages"]):
14
+ message(msg, is_user=is_user, key=str(i))
15
+ st.session_state["thinking_spinner"] = st.empty()
16
+
17
+
18
+ def process_input():
19
+ if st.session_state["user_input"] and len(st.session_state["user_input"].strip()) > 0:
20
+ user_text = st.session_state["user_input"].strip()
21
+ with st.session_state["thinking_spinner"], st.spinner(f"Thinking"):
22
+ agent_text = st.session_state["assistant"].ask(user_text)
23
+
24
+ agent_text = agent_text.split("Answer: [/INST]")[1].strip()
25
+ st.session_state["messages"].append((user_text, True))
26
+ st.session_state["messages"].append((agent_text, False))
27
+
28
+
29
+ def read_and_save_file():
30
+ st.session_state["assistant"].clear()
31
+ st.session_state["messages"] = []
32
+ st.session_state["user_input"] = ""
33
+
34
+ for file in st.session_state["file_uploader"]:
35
+ with tempfile.NamedTemporaryFile(delete=False) as tf:
36
+ tf.write(file.getbuffer())
37
+ file_path = tf.name
38
+
39
+ with st.session_state["ingestion_spinner"], st.spinner(f"Ingesting {file.name}"):
40
+ st.session_state["assistant"].ingest(file_path)
41
+ os.remove(file_path)
42
+
43
+
44
+ def page():
45
+ if len(st.session_state) == 0:
46
+ st.session_state["messages"] = []
47
+ st.session_state["assistant"] = ChatPDF()
48
+
49
+ st.header("ChatPDF")
50
+
51
+ st.subheader("Upload a document")
52
+ st.file_uploader(
53
+ "Upload document",
54
+ type=["pdf"],
55
+ key="file_uploader",
56
+ on_change=read_and_save_file,
57
+ label_visibility="collapsed",
58
+ accept_multiple_files=True,
59
+ )
60
+
61
+ st.session_state["ingestion_spinner"] = st.empty()
62
+
63
+ display_messages()
64
+ st.text_input("Message", key="user_input", on_change=process_input)
65
+
66
+
67
+ if __name__ == "__main__":
68
+ page()
image-1.png ADDED
rag.py ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from langchain_community.vectorstores import Chroma
2
+ from langchain_community.embeddings import FastEmbedEmbeddings
3
+ from langchain.schema.output_parser import StrOutputParser
4
+ from langchain_community.document_loaders import PyPDFLoader
5
+ from langchain.text_splitter import RecursiveCharacterTextSplitter
6
+ from langchain.schema.runnable import RunnablePassthrough
7
+ from langchain.prompts import PromptTemplate
8
+ from langchain.vectorstores.utils import filter_complex_metadata
9
+ from langchain_community.llms import HuggingFaceHub
10
+ import os
11
+ prompt_text = """
12
+ <s> [INST] You are an assistant for question-answering tasks. Use the following pieces of retrieved context
13
+ to answer the question. If you don't know the answer, just say that you don't know. Use three sentences
14
+ maximum and keep the answer concise. [/INST] </s>
15
+ [INST] Question: {question}
16
+ Context: {context}
17
+ Answer: [/INST]
18
+ """
19
+
20
+ HUGGING_FACE_API_TOKEN = os.getenv("HUGGING_FACE_API_TOKEN")
21
+
22
+ class ChatPDF:
23
+ vector_store = None
24
+ retriever = None
25
+ chain = None
26
+
27
+ def __init__(self):
28
+ repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
29
+ self.model = HuggingFaceHub(huggingfacehub_api_token=HUGGING_FACE_API_TOKEN,
30
+ repo_id=repo_id, model_kwargs={"temperature":0.8, "max_new_tokens":100})
31
+
32
+ self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=100)
33
+ self.prompt = PromptTemplate.from_template(
34
+ prompt_text,
35
+ )
36
+
37
+ def ingest(self, pdf_file_path: str):
38
+ docs = PyPDFLoader(file_path=pdf_file_path).load()
39
+ chunks = self.text_splitter.split_documents(docs)
40
+ chunks = filter_complex_metadata(chunks)
41
+
42
+ vector_store = Chroma.from_documents(documents=chunks, embedding=FastEmbedEmbeddings())
43
+ self.retriever = vector_store.as_retriever(
44
+ search_type="similarity_score_threshold",
45
+ search_kwargs={
46
+ "k": 3,
47
+ "score_threshold": 0.5,
48
+ },
49
+ )
50
+
51
+ self.chain = ({"context": self.retriever, "question": RunnablePassthrough()}
52
+ | self.prompt
53
+ | self.model
54
+ | StrOutputParser())
55
+
56
+ def ask(self, query: str):
57
+ if not self.chain:
58
+ return "Please, add a PDF document first."
59
+
60
+ return self.chain.invoke(query)
61
+
62
+ def clear(self):
63
+ self.vector_store = None
64
+ self.retriever = None
65
+ self.chain = None
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ langchain==0.1.14
2
+ langchain-community==0.0.31
3
+ streamlit-chat>=0.1.1,<0.2.0
4
+ pypdf>=3.17.1,<4.0.0
5
+ fastembed>=0.2.4,<0.3.0
6
+ openai>=1.3.6,<2.0.0
7
+ langchainhub>=0.1.14,<0.2.0
8
+ chromadb==0.3.29
9
+ streamlit>=1.29.0,<2.0.0