# üì± Using RAG to perform analysis of AirBNB Quarterly Report! ü§ñ

[AirBNB Quarterly Report For the quarterly period ended March 31, 2024](https://airbnb2020ipo.q4web.com/files/doc_financials/2024/q1/fdb60f7d-e616-43dc-86ef-e33d3a9bdd05.pdf)

## Question 1
- question: What is Airbnb's 'Description of Business'?
- response: Airbnb's 'Description of Business' is operating a global platform for unique stays and experiences, connecting hosts and guests online or through mobile devices to book spaces and experiences around the world.
- LangSmith trace: https://smith.langchain.com/public/ebdf5473-64ac-4f85-81ab-bd3c3d624969/r

## Question 2
- question: What was the total value of 'Cash and cash equivalents' as of December 31, 2023?
- response: The total value of 'Cash and cash equivalents' as of December 31, 2023, is $2,369.
- LangSmith trace: https://smith.langchain.com/public/b0f93487-c729-4ccf-93f9-0354078282d8/r

## Question 3
- question: What is the 'maximum number of shares to be sold under the 10b5-1 Trading plan' by Brian Chesky?
- response: The maximum number of shares to be sold under the 10b5-1 Trading plan by Brian Chesky is 1,146,000.
- LangSmith trace: https://smith.langchain.com/public/7fc4b549-2ea5-4b86-abf9-71d5e9a62738/r

In [1]:
%pip install -qU pymupdf 
%pip install -qU langchain langchain-core langchain-community langchain-text-splitters 
%pip install -qU langchain-openai
%pip install -qU langchain-groq
%pip install -qU langchain-qdrant

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
from langchain import hub
from langchain_groq import ChatGroq

llm = ChatGroq(model="llama3-70b-8192", temperature=0.3)

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")
QDRANT_API_URL = os.getenv("QDRANT_URL")

# LangSmith tracing and 
os.environ["LANGCHAIN_PROJECT"] = "AirBnB PDF Jun18"
os.environ["LANGCHAIN_ENDPOINT"]=os.getenv("LANGCHAIN_ENDPOINT")
os.environ["LANGCHAIN_API_KEY"]=os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"]=os.getenv("LANGCHAIN_TRACING_V2")

# Leverage a prompt from the LangChain hub
prompt = hub.pull("rlm/rag-prompt-llama3")
# prompt = hub.pull("cracked-nut/securities-comm-llama3")
# prompt = hub.pull("cracked-nut/securities-comm-llama3-v2")

In [3]:
# Parameterize some stuff

LOAD_NEW_DATA = False # Set to True to load new data
FILE_PATH = "/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf" # Path to the source PDF file
collection = "airbnb_pdf_rec_1000_200_images" # qdrant collection name

# QUESTION = "What is Airbnb's 'Description of Business'?"
QUESTION = "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"
# QUESTION = "What is the 'maximum number of shares to be sold under the 10b5-1 Trading plan' by Brian Chesky?"

In [4]:
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_qdrant import Qdrant

In [5]:
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(model="text-embedding-3-small")

In [6]:
# run loader if LOAD_NEW_DATA is True
if LOAD_NEW_DATA:
    loader = PyMuPDFLoader(FILE_PATH, extract_images=True)
    docs = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)

In [7]:
# Store the chunks in Qdrant
if LOAD_NEW_DATA:
    from_splits = Qdrant.from_documents(
        embedding=embedding,
        collection_name=collection,
        url=QDRANT_API_URL,
        api_key=QDRANT_API_KEY,
        prefer_grpc=True,   
        documents=splits,
    )

In [14]:
qdrant = Qdrant.from_existing_collection(
    embedding=embedding,
    collection_name=collection,
    url=QDRANT_API_URL,
    api_key=QDRANT_API_KEY,
    prefer_grpc=True,     
)

# retriever = qdrant.as_retriever(search_type="mmr", search_kwargs={"k": 8})

# retriever = qdrant.as_retriever(search_kwargs={"k": 5})

retriever = qdrant.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.5, "k": 5}
)


In [15]:
from operator import itemgetter
from langchain.schema.runnable import RunnablePassthrough

rag_chain = (
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": prompt | llm, "context": itemgetter("context")}
)

In [16]:
print(rag_chain.get_graph().draw_ascii())

                      +---------------------------------+                        
                      | Parallel<context,question>Input |                        
                      +---------------------------------+                        
                           ****                   ****                           
                       ****                           ***                        
                     **                                  ****                    
+--------------------------------+                           **                  
| Lambda(itemgetter('question')) |                            *                  
+--------------------------------+                            *                  
                 *                                            *                  
                 *                                            *                  
                 *                                            *                  
     +----------

In [17]:
response = rag_chain.invoke({"question" : QUESTION})

In [18]:
# return the response.  filter on the response key AIMessage content element
print(response["response"].content)


The total value of 'Cash and cash equivalents' as of December 31, 2023, is $2,369.


In [19]:
response["context"]

[Document(page_content='December 31, 2023\nLevel\xa01\nLevel\xa02\nLevel\xa03\nTotal\nAssets\nCash and cash equivalents:\nMoney market funds\n$\n2,018\xa0 $\n‚Äî\xa0 $\n‚Äî\xa0 $\n2,018\xa0\nCertificates of deposit\n‚Äî\xa0\n1\xa0\n‚Äî\xa0\n1\xa0\nGovernment bonds\n‚Äî\xa0\n115\xa0\n‚Äî\xa0\n115\xa0\nCommercial paper\n‚Äî\xa0\n223\xa0\n‚Äî\xa0\n223\xa0\nCorporate debt securities\n‚Äî\xa0\n12\xa0\n‚Äî\xa0\n12\xa0\n2,018\xa0\n351\xa0\n‚Äî\xa0\n2,369\xa0\nShort-term investments:\nCertificates of deposit\n‚Äî\xa0\n172\xa0\n‚Äî\xa0\n172\xa0\nGovernment bonds\n‚Äî\xa0\n333\xa0\n‚Äî\xa0\n333\xa0\nCommercial paper\n‚Äî\xa0\n366\xa0\n‚Äî\xa0\n366\xa0\nCorporate debt securities\n‚Äî\xa0\n1,491\xa0\n‚Äî\xa0\n1,491\xa0\nMortgage-backed and asset-backed securities\n‚Äî\xa0\n145\xa0\n‚Äî\xa0\n145\xa0\n‚Äî\xa0\n2,507\xa0\n‚Äî\xa0\n2,507\xa0\nFunds receivable and amounts held on behalf of customers:\nMoney market funds\n1,360\xa0\n‚Äî\xa0\n‚Äî\xa0\n1,360\xa0\nPrepaids and other current assets:\nForeig