# Quickstart: Querying PDF With Astra and LangChain

### A question-answering demo using Astra DB and LangChain, powered by Vector Search

#### Pre-requisites:

You need a **_Serverless Cassandra with Vector Search_** database on [Astra DB](https://astra.datastax.com) to run this demo. As outlined in more detail [here](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html#_prepare_for_using_your_vector_database), you should get a DB Token with role _Database Administrator_ and copy your Database ID: these connection parameters are needed momentarily.

You also need an [OpenAI API Key](https://cassio.org/start_here/#llm-access) for this demo to work.

#### What you will do:

- Setup: import dependencies, provide secrets, create the LangChain vector store;
- Run a Question-Answering loop retrieving the relevant headlines and having an LLM construct the answer.

Install the required dependencies:

In [1]:
!pip install -qU cassio datasets langchain openai tiktoken

Import the packages you'll need:

In [2]:
# LangChain components to use
from langchain.vectorstores.cassandra import Cassandra
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings

# Support for dataset retrieval with Hugging Face
from datasets import load_dataset

# With CassIO, the engine powering the Astra DB integration in LangChain,
# you will also initialize the DB connection:
import cassio

In [3]:
!pip install PyPDF2

Collecting PyPDF2
 Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [4]:
from PyPDF2 import PdfReader

### Setup

In [5]:
ASTRA_DB_APPLICATION_TOKEN = "AstraCS:OsOjMKLLxkWFoUpmNbWeJwIP:d8b4df7fd17c288edd265f9d167fa821e97e9d97098842c2e3ed4140d756d02d"
ASTRA_DB_ID = "f97bbcce-b48b-4b42-8ad0-fdc38b2e165e" # enter your Database ID
OPENAI_API_KEY = "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" # enter your OpenAI key

#### Provide your secrets:

Replace the following with your Astra DB connection details and your OpenAI API key:

In [6]:
# provide the path of pdf file/files.
pdfreader = PdfReader('Ethics.pdf')

In [7]:
from typing_extensions import Concatenate
# read text from pdf
raw_text = ''
for i, page in enumerate(pdfreader.pages):
 content = page.extract_text()
 if content:
 raw_text += content

In [8]:
raw_text



Initialize the connection to your database:

_(do not worry if you see a few warnings, it's just that the drivers are chatty about negotiating protocol versions with the DB.)_

In [9]:
cassio.init(token=ASTRA_DB_APPLICATION_TOKEN, database_id=ASTRA_DB_ID)

Create the LangChain embedding and LLM objects for later usage:

In [10]:
llm = OpenAI(openai_api_key=OPENAI_API_KEY)
embedding = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

Create your LangChain vector store ... backed by Astra DB!

In [11]:
astra_vector_store = Cassandra(
 embedding=embedding,
 table_name="qa_mini_demo",
 session=None,
 keyspace=None,
)

In [12]:
from langchain.text_splitter import CharacterTextSplitter
# We need to split the text using Character Text Split such that it sshould not increse token size
text_splitter = CharacterTextSplitter(
 separator = "\n",
 chunk_size = 800,
 chunk_overlap = 200,
 length_function = len,
)
texts = text_splitter.split_text(raw_text)

In [13]:
texts[:50]

['21 Pusa Road, Karol Bagh, New Delhi-11000521 Pusa Road, Karol Bagh, New Delhi-110005\nContact No.:Contact No.: 8010440440, 8750187501 8010440440, 8750187501\nWebsite:Website: www.drishtiIAS.com www.drishtiIAS.com\ne-mail:e-mail: englishsupport@groupdrishti.com englishsupport@groupdrishti.comETHICS, \nINTEGRITY \nAND APTITUDEETHICS, \nINTEGRITY \nAND APTITUDEIntroduct Ion\nDrishti The Vision Foundation ©641, Mukherjee Nagar, Opp. \nSignature View Apartment, \nNew Delhi21, Pusa Road, \nKarol Bagh,\nNew DelhiTashkent Marg, \nCivil Lines, Prayagraj, \nUttar PradeshTonk Road,\nVasundhra Colony, \nJaipur, Rajasthan\ne-mail: englishsupport@groupdrishti.com, Website: www.drishtiIAS.com, Contact: 8010440440, 8750187501Ethics is a normative subject.',
 'Uttar PradeshTonk Road,\nVasundhra Colony, \nJaipur, Rajasthan\ne-mail: englishsupport@groupdrishti.com, Website: www.drishtiIAS.com, Contact: 8010440440, 8750187501Ethics is a normative subject.\nConfusion/Dilemma: It is cognitive in nature an

### Load the dataset into the vector store



In [14]:

astra_vector_store.add_texts(texts[:])

print("Inserted %i headlines." % len(texts[:]))

astra_vector_index = VectorStoreIndexWrapper(vectorstore=astra_vector_store)

Inserted 518 headlines.


### Run the QA cycle

Simply run the cells and ask a question -- or `quit` to stop. (you can also stop execution with the "▪" button on the top toolbar)

Here are some suggested questions:
- _What is difference between Ethics and morality?_
- _‘Women is not born, she is made so’,explain it in 2000 words_


In [15]:
first_question = True
while True:
 if first_question:
 query_text = input("\nEnter your question (or type 'quit' to exit): ").strip()
 else:
 query_text = input("\nWhat's your next question (or type 'quit' to exit): ").strip()

 if query_text.lower() == "quit":
 break

 if query_text == "":
 continue

 first_question = False

 print("\nQUESTION: \"%s\"" % query_text)
 answer = astra_vector_index.query(query_text, llm=llm).strip()
 print("ANSWER: \"%s\"\n" % answer)

 print("FIRST DOCUMENTS BY RELEVANCE:")
 for doc, score in astra_vector_store.similarity_search_with_score(query_text, k=4):
 print(" [%0.4f] \"%s ...\"" % (score, doc.page_content[:2000]))


Enter your question (or type 'quit' to exit): Women is not born, she is made so’,explain it in 2000 words



QUESTION: "Women is not born, she is made so’,explain it in 2000 words"
ANSWER: "The statement "Women is not born, she is made so" is a powerful and thought-provoking statement made by French existentialist philosopher, Simone de Beauvoir. This statement highlights the societal construct of gender and how it is not solely determined by biological factors, but rather by the social and cultural norms and expectations imposed on individuals based on their gender.

To fully understand this statement, it is important to first understand the concept of gender and how it differs from sex. Sex refers to the biological characteristics that distinguish males and females, such as reproductive organs and hormones. On the other hand, gender refers to the socially constructed roles, behaviors, and attributes that a particular society considers appropriate for men and women. In other words, gender is a product of societal norms and expectations, while sex is determined by biology.

The statement by 


What's your next question (or type 'quit' to exit): quit


In [None]:
first_question = True
while True:
 if first_question:
 query_text = input("\nEnter your question (or type 'quit' to exit): ").strip()
 else:
 query_text = input("\nWhat's your next question (or type 'quit' to exit): ").strip()

 if query_text.lower() == "quit":
 break

 if query_text == "":
 continue

 first_question = False

 print("\nQUESTION: \"%s\"" % query_text)
 answer = astra_vector_index.query(query_text, llm=llm).strip()
 print("ANSWER: \"%s\"\n" % answer)


Enter your question (or type 'quit' to exit): how morality affects persons mind and behaviur



QUESTION: "how morality affects persons mind and behaviur"
ANSWER: "Morality can have a significant impact on a person's mind and behavior. It refers to a set of principles or values that guide an individual's actions and decisions, and can shape their character and sense of right and wrong. When a person adheres to a moral code, they may feel a sense of inner peace and fulfillment, while violating moral principles can lead to guilt and inner turmoil. Additionally, moral standards set by society can influence a person's behavior, as they may conform to social norms and expectations in order to be accepted by others."




What's your next question (or type 'quit' to exit): can u summaries the whole pdf



QUESTION: "can u summaries the whole pdf"
ANSWER: "The PDF discusses the approach to ethics and the importance of examining one's conscience. It also mentions the need for an open, rational, and balanced mind when it comes to ethical decision making. The concept of "fake life" versus "authentic life" is also discussed, along with the ideas of philosophers such as René Descartes and Edmund Husserl. The overall message is that one should critically examine their beliefs and doubts before making any decisions."




What's your next question (or type 'quit' to exit): can be list the name of phisohers dicussed in it 



QUESTION: "can be list the name of phisohers dicussed in it"
ANSWER: "Mahatma Gandhi, Geeta, Jain Ethics, Budhist Ethics, Deen Dayal Upadhyay, Charvaka, Thiruvalluvar, Dr. Ambedkar, Gautama, Jaimini, Prabhakar Mishra, Kumarail Bhatt, Kapila Muni, Patanjali, Herbert Spencer"

