File size: 3,155 Bytes
f1fe629
094cc8d
f1fe629
 
 
 
9d83513
f1fe629
f681f38
f1fe629
 
094cc8d
 
f1fe629
fd1a31f
f1fe629
094cc8d
7a95605
fd1a31f
 
094cc8d
71b69c9
7a95605
fd1a31f
f1fe629
094cc8d
f1fe629
ba6b3cf
 
d37a494
 
 
 
fd1a31f
 
 
 
71b69c9
fd1a31f
 
 
 
39103b5
fd1a31f
 
 
 
 
094cc8d
fd1a31f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71b69c9
fd1a31f
 
 
 
 
 
 
 
 
9d83513
fd1a31f
 
 
 
094cc8d
fd1a31f
71b69c9
fd1a31f
71b69c9
fd1a31f
71b69c9
fd1a31f
 
094cc8d
fd1a31f
 
 
 
 
 
094cc8d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
title: GnosisPages
emoji: 📝
colorFrom: red
colorTo: pink
sdk: streamlit
app_file: GnosisPages.py
pinned: false
license: mit
---

# GnosisPages
GnosisPages is a tool that helps you to create your own knowledge base for retrieval information when interacting with a LLM. The app take advantage of the frameworks Streamlit and Langchain and uses a client-side ChromaDB.

## Features

GnosisPages offers you the following key features:

- **Upload PDF files**: Upload PDF files until 200MB size. PDF files should be programmatically created or processed by an OCR tool.
- **Extract and split text**: Extract the content of your PDF files and split them for a better querying.
- **Store in a client-side VectorDB**: GnosisPages uses ChromaDB for storing the content of your pdf files on vectors (ChromaDB use by default "all-MiniLM-L6-v2" for embeddings)
- **Consult the info of your knowledge base**: Ask questions to the Intelligent Assitant about the content of your knowledge base. The Langchain Agent will use ChromaDB query functions as a tool.

## Demo 

[Try the GnosisPages's demo](https://huggingface.co/spaces/maclenn77/pdf-explainer)!!!

[Watch a demo here](https://youtu.be/OEQTusJGHFQ)

## Architecture

![schematic-1](https://github.com/Maclenn77/pdf-explainer/assets/1808402/36dbacfa-43f3-4530-9d31-0e9b1127f992)

## Prerrequisites

For using the demo, you only need an OpenAI API Key.

If you prefer to clone the project and run on local environment, you will require:

- Python ( developed with v3.11)
- OpenAI API Key
- Langchain
- ChromaDB
- Streamlit
- A code editor

## Setup

Follow the next steps to set up GnosisPages in your local environment:

1. Clone this repository

```bash
    git clone https://github.com/maclenn77/pdf-explainer.git
```

3. Navigate to the project directory
```bash
   cd pdf-explainer
```
4. Create your .env file
```bash
   touch .env
   nano .env # or your prefered text editor
```
 And add your OpenAI API Key.
```yaml
   OPENAI_API_KEY=YOUR_OPENAI_API_KEY
```
5. Install dependencies.
```bash
   pip install -r requirements.txt
```
6. Run on your local environment
```bash
   streamlit run GnosisPages.py
```

## Deployment

GnosisPages's repo includes workflows for deploying to HuggingFace. 

1. **Check file size**: Prevents to merge and deploy files over the limit provided by HuggingFace 🤗.
2. **Check lints**: Analize the code with pylint.
3. **Deploy to HuggingFace**: Once a branch is merged into main, the last version is deployed on a HuggingFace Space.

For deploying, you need to add `HF_TOKEN` as secret in the settings of your fork and add a HuggingFace user with the variable name `HF_USERNAME`.

## Feedback and Contributions
If you have any feedback or would like to contribute to GnosisPages's development, please feel free to open issues or submit pull requests in the GitHub repository.

## License
This project is licensed under the MIT License. See the LICENSE file for details.

---

Enjoy using GnosisPages to create and consult your knowled base! If you have any questions or encounter issues during the setup process, please don't hesitate to reach out for assistance.