Issue with Keyword-Based Queries in RAG Chatbot Using Vector Database

#131
by rishik10 - opened

Hi everyone,

I'm working on a chatbot project using Retrieval-Augmented Generation (RAG). I've stored sample Q&A pairs in a vector database after embedding them. The chatbot performs well when I ask full questions, but it returns the whole script when I use specific keywords. Here’s a summary of my setup and the issue:

Setup:

Document Embedding and Storage:

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceBgeEmbeddings

loader = PyPDFLoader("abc.pdf")
pages = loader.load_and_split()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(pages)

embeddings = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vectordb = Chroma.from_documents(documents=splits, embedding=embeddings)
vectordb.persist()

Retrieval and QA Chain:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, TextStreamer
from langchain import HuggingFacePipeline, PromptTemplate
from langchain.chains import RetrievalQA

model_name = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.float16, device_map="auto", use_cache=True)

streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
text_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=1000, do_sample=False, repetition_penalty=1.15, streamer=streamer)
llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0.1})

retriever = vectordb.as_retriever(search_type="similarity", search_kwargs={"k": 6})
template = """
[INST] <>
You are a friendly chatbot assistant that responds in a conversational manner to user's questions based on the document.
Return only the specific short answer which corresponds to the question in the document and nothing else.
Do not generate your own answer.
Just say "I do not know" if you cannot answer.
Provide the answer in one sentence or less.
Do not include phrases like "based on the context given" or "based on this line." or any explanation
<>

{context}

{question} [/INST]
Possible Answer :
"""
prompt = PromptTemplate(template=template, input_variables=["context", "question"])

qa_chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, chain_type_kwargs={"prompt": prompt})

query = "activities"
result_ = qa_chain(query)
split_text = result_['result'].split('Possible Answer :')
possible_answer = split_text[1].strip()
print(possible_answer)

Issue:
When querying with specific keywords, the bot returns large sections of the script rather than a concise answer.

Questions:

How can I improve the handling of keyword-based queries to return more relevant, concise answers?
Are there adjustments needed in the chunk size or the retriever's search settings?

Any suggestions or insights would be greatly appreciated. Thanks!

I don't think, it's expected to work when you provide a single word/query.

A bit about the RAG process:
The query will get its embedding (using embedding model), then does retrieving from the context and LLM parses the context and gives output.

If you provide just a specific keyword, the embedding of the query wouldn't yield appropriate results from the database, reason being, the database has embeddings of the sentences, which you're trying to match with a single word embedding. Btw, what output are you expecting out of it?

Sign up or log in to comment