Spaces:
Running
A newer version of the Streamlit SDK is available:
1.41.1
title: PDFChatbot
emoji: 😻
colorFrom: green
colorTo: red
sdk: streamlit
sdk_version: 1.37.1
app_file: app.py
pinned: false
license: apache-2.0
PDFChatbot
Pdf Chatbot is a RAG-based system, a context-aware system trained to answer queries regarding the Document that the user gives. It can Trace the Chat from the encoding to generating the output using the powerful Langsmith.
The Document need to be fed to the model in order to train the model for the question and answer session.
HuggingFace Embeddings:
The application uses the HuggingFace BAAI/llm-embedder model to convert text data into embeddings, which are then used to facilitate efficient and accurate search within the PDF content.
FAISS Vector Store:
FAISS (Facebook AI Similarity Search) is employed to store and search text embeddings. This allows the application to quickly retrieve relevant sections of the PDF based on user queries.
Processing the PDF: Extracting and Embedding Text
The core functionality starts with processing the uploaded PDF. Using PyPDFLoader fromLangChain, the PDF content is extracted and split into manageable chunks. These chunks are then transformed into embeddings using the HuggingFace model. The embeddings are stored in a FAISS vector store, allowing for quick and efficient retrieval during user interactions.
Handling User Queries
The application is designed to handle user queries seamlessly and efficiently. When a user enters a question, the system retrieves relevant document sections using FAISS, applies a custom prompt template, and generates a response using the Groq LLM model(mixtral-8x7b-32768).
Storing the History of the Chat and displaying it using Interactive UI
The user interface is built using Streamlit, providing an easy-to-use platform for interacting with the PDF. Users can upload their PDFs, enter queries, and view previous interactions all within the same interface. The chat history is stored in the session state, ensuring that the conversation flows smoothly and previous questions and answers are easily accessible.
Detailed Tracing:
LangSmith provides detailed tracing of the entire workflow, helping you understand how data flows through the system. This is especially useful when you’re dealing with multiple components like document loaders, embedding models, and vector databases.
Debugging:
By tracing each step in the process, LangSmith makes it easier to identify where something might be going wrong. For instance, if the responses to user queries are not as expected, you can trace back through the chain to see if the issue lies in the retrieval process, the embedding model, or the prompt generation.
Performance Monitoring:
LangSmith allows you to monitor the performance of each component, giving insights into which parts of the process are taking the most time or resources. This is crucial for optimizing the application, especially as it scales.
API Reference
Get all items
GET /api/items
Parameter | Type | Description |
---|---|---|
api_key |
GroqAPI |
Required. Your API key |
api_key |
LangsmithAPI |
Your API key |
Required items For Feeding the Data
Parameter | Type | Description |
---|---|---|
PDF |
Document |
Required. path of item to fetch |
Tech Stack
LLM : GroqApi(mixtral-8x7b-32768)
Embedding : HuggingFaceBgeEmbeddings
Langchain
Langsmith
Environment Variables
To run this project, you will need to add the following environment variables to your .env file
GROQ_API
LANGSMITH_API
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference