File size: 1,550 Bytes
8e028e6
d8cba91
 
8e028e6
 
 
d8cba91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8e028e6
d8cba91
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
title: QA-bot
app_file: app.py
sdk: gradio
sdk_version: 4.44.0
---
# PDF Question-Answering App using LangChain, Pinecone, and Mistral

This project is a RAG app designed to perform question-answering (QA) on PDF documents. It uses the `LangChain` framework for embedding, `Pinecone` for vector storage, and the `mistral` language model for generating responses to user queries.

## Features
- **PDF Handling**: Load and split PDF files into manageable chunks for processing.
- **Embeddings**: I am using the `SentenceTransformerEmbeddings` to create embeddings for document chunks.
- **Vector Storage**: Pinecone is used to store document embeddings and efficiently retrieve relevant chunks based on user questions.
- **LLM Integration**: I tried using LLMs locally using `Ollama`but due to lack of compute resources I used `mistral` for faster and better responses.
- **Environment Variables**: Secrets like API keys are securely managed using `.env` files.

## Requirements
- Python 3.12
- Run `pip install -r requirements.txt`
- The following teck stack is used:
  - `langchain`
  - `pinecone` Make sure to sign up and create Pinecone API key 
  - `Mistral API`
  

## Setup

### 1. Clone the Repository
```bash
git clone https://github.com/m-umar-j/RAG-APP
cd RAG-APP
```
### 2. install the requirements using 
 ```bash
 pip install -r requirements.txt`
```
### 3. create .env file in your root directory and add pinecone API key

``` makefile
PINECONE_API_KEY=your-pinecone-api-key
```
### 4. modify paths

`file_path = "/path/to/data.pdf"`