RAQA-with-LlamaIndex-and-a-fine-tuned-GPT-35

Runtime error

App Files Files Community

Harpreet Sahota commited on Aug 29, 2023

Commit

b86d555

0 Parent(s):

Duplicate from harpreetsahota/RAQA-Application-Chainlit-Demo

Browse files

Files changed (9) hide show

.env.example +1 -0
.gitattributes +35 -0
.gitignore +4 -0
Dockerfile +11 -0
README.md +12 -0
app.py +121 -0
chainlit.md +11 -0
data/spiderverse.csv +0 -0
requirements.txt +5 -0

.env.example ADDED Viewed

	@@ -0,0 +1 @@


1	+ OPENAI_API_KEY=sk-...

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,4 @@

+.env
+__pycache__
+cache
+.chainlit

Dockerfile ADDED Viewed

	@@ -0,0 +1,11 @@

+FROM python:3.9
+RUN useradd -m -u 1000 user
+USER user
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH
+WORKDIR $HOME/app
+COPY --chown=user . $HOME/app
+COPY ./requirements.txt ~/app/requirements.txt
+RUN pip install -r requirements.txt
+COPY . .
+CMD ["chainlit", "run", "app.py", "--port", "7860"]

README.md ADDED Viewed

	@@ -0,0 +1,12 @@

+---
+title: Spidey-verse RAQA Application Chainlit Demo
+emoji: 🔥
+colorFrom: red
+colorTo: red
+sdk: docker
+pinned: false
+license: apache-2.0
+duplicated_from: harpreetsahota/RAQA-Application-Chainlit-Demo
+---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py ADDED Viewed

	@@ -0,0 +1,121 @@

+import chainlit as cl
+from langchain.embeddings.openai import OpenAIEmbeddings
+from langchain.document_loaders.csv_loader import CSVLoader
+from langchain.embeddings import CacheBackedEmbeddings
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain.vectorstores import FAISS
+from langchain.chains import RetrievalQA
+from langchain.chat_models import ChatOpenAI
+from langchain.storage import LocalFileStore
+from langchain.prompts.chat import (
+    ChatPromptTemplate,
+    SystemMessagePromptTemplate,
+    HumanMessagePromptTemplate,
+)
+import chainlit as cl
+text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
+system_template = """
+Use the following pieces of context to answer the user's question.
+Please respond as if you were Miles Morales from the Spider-Man comics and movies. General speech patterns: Uses contractions often, like "I'm," "can't," and "don't."
+Might sprinkle in some Spanish, given his Puerto Rican heritage. References to modern pop culture, music, or tech. Miles is a brave young hero, grappling with his dual
+heritage and urban life. He has a passion for music, especially hip-hop, and is also into art, being a graffiti artist himself. He speaks with an urban and youthful tone,
+reflecting the voice of modern NYC youth. He might occasionally reference modern pop culture, his friends, or his school life.
+If you don't know the answer, just say you're unsure. Don't try to make up an answer.
+You can make inferences based on the context as long as it aligns with Miles' personality and experiences.
+Example of your interaction:
+User: "What did you think of the latest Spider-Man movie?"
+MilesBot: "Haha, watching Spider-Man on screen is always surreal for me. But it's cool to see different takes on the web-slinger's story. Always reminds me of the Spider-Verse!"
+Example of your response:
+```
+The answer is foo
+```
+Begin!
+----------------
+{context}"""
+messages = [
+    SystemMessagePromptTemplate.from_template(system_template),
+    HumanMessagePromptTemplate.from_template("{question}"),
+]
+prompt = ChatPromptTemplate(messages=messages)
+chain_type_kwargs = {"prompt": prompt}
+@cl.author_rename
+def rename(orig_author: str):
+    rename_dict = {"RetrievalQA": "Crawling the Spiderverse"}
+    return rename_dict.get(orig_author, orig_author)
+@cl.on_chat_start
+async def init():
+    msg = cl.Message(content=f"Building Index...")
+    await msg.send()
+    # build FAISS index from csv
+    loader = CSVLoader(file_path="./data/spiderverse.csv", source_column="Review_Url")
+    data = loader.load()
+    documents = text_splitter.transform_documents(data)
+    store = LocalFileStore("./cache/")
+    core_embeddings_model = OpenAIEmbeddings()
+    embedder = CacheBackedEmbeddings.from_bytes_store(
+        core_embeddings_model, store, namespace=core_embeddings_model.model
+    )
+    # make async docsearch
+    docsearch = await cl.make_async(FAISS.from_documents)(documents, embedder)
+    chain = RetrievalQA.from_chain_type(
+        ChatOpenAI(model="gpt-4", temperature=0, streaming=True),
+        chain_type="stuff",
+        return_source_documents=True,
+        retriever=docsearch.as_retriever(),
+        chain_type_kwargs = {"prompt": prompt}
+    )
+    msg.content = f"Index built!"
+    await msg.send()
+    cl.user_session.set("chain", chain)
+@cl.on_message
+async def main(message):
+    chain = cl.user_session.get("chain")
+    cb = cl.AsyncLangchainCallbackHandler(
+        stream_final_answer=False, answer_prefix_tokens=["FINAL", "ANSWER"]
+    )
+    cb.answer_reached = True
+    res = await chain.acall(message, callbacks=[cb], )
+    answer = res["result"]
+    source_elements = []
+    visited_sources = set()
+    # Get the documents from the user session
+    docs = res["source_documents"]
+    metadatas = [doc.metadata for doc in docs]
+    all_sources = [m["source"] for m in metadatas]
+    for source in all_sources:
+        if source in visited_sources:
+            continue
+        visited_sources.add(source)
+        # Create the text element referenced in the message
+        source_elements.append(
+            cl.Text(content="https://www.imdb.com" + source, name="Review URL")
+        )
+    if source_elements:
+        answer += f"\nSources: {', '.join([e.content.decode('utf-8') for e in source_elements])}"
+    else:
+        answer += "\nNo sources found"
+    await cl.Message(content=answer, elements=source_elements).send()

chainlit.md ADDED Viewed

	@@ -0,0 +1,11 @@

+# Assignment Part 2: Deploying Your Model to a Hugging Face Space
+Now that you've done the hard work of setting up the RetrievalQA chain and sourcing your documents - let's tie it together in a ChainLit application.
+### Duplicating the Space
+Since this is our first assignment, all you'll need to do is duplicate this space and add your own `OPENAI_API_KEY` as a secret in the space.
+### Conclusion
+Now that you've shipped an LLM-powered application, it's time to share! 🚀

data/spiderverse.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+chainlit==0.6.2
+langchain==0.0.265
+tiktoken==0.4.0
+openai==0.27.8
+faiss-cpu==1.7.4