Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
@@ -1,59 +1,70 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
4.
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: "DocBot: Smart Document ChatBot"
|
3 |
+
emoji: π€
|
4 |
+
colorFrom: indigo
|
5 |
+
colorTo: purple
|
6 |
+
sdk: streamlit
|
7 |
+
sdk_version: "0.87.0"
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
---
|
11 |
+
|
12 |
+
# π€ DocBot: Smart Document ChatBot
|
13 |
+
|
14 |
+
DocBot is an intelligent document processing application with a chatbot interface. It can process various types of documents, including PDFs and images, extract essential information, and enable user interaction through a chat interface.
|
15 |
+
|
16 |
+
## βοΈ Features
|
17 |
+
|
18 |
+
- **Document Upload**: Upload PDF, PNG, JPG, or JPEG files for processing.
|
19 |
+
- **Text Extraction**: Extract text content from uploaded documents.
|
20 |
+
- **Image Processing**: Convert PDF documents to images and extract text from images.
|
21 |
+
- **Chatbot Interface**: Interact with the document through a chatbot interface powered by Groq.
|
22 |
+
- **Natural Language Understanding**: Utilizes spaCy for natural language processing.
|
23 |
+
- **Dynamic Progress Bar**: Visual feedback on document processing progress.
|
24 |
+
- **Error Handling**: Provides error messages for any processing failures.
|
25 |
+
|
26 |
+
## βοΈ Installation
|
27 |
+
|
28 |
+
1. Clone the repository:
|
29 |
+
|
30 |
+
```bash
|
31 |
+
git clone https://github.com/yourusername/docbot.git
|
32 |
+
```
|
33 |
+
|
34 |
+
2. Install the required Python packages:
|
35 |
+
|
36 |
+
```bash
|
37 |
+
pip install -r requirements.txt
|
38 |
+
```
|
39 |
+
|
40 |
+
3. Set up the environment variables:
|
41 |
+
|
42 |
+
Create a `.env` file in the root directory and add the following:
|
43 |
+
|
44 |
+
```dotenv
|
45 |
+
GROQ_API_KEY='your_groq_api_key'
|
46 |
+
```
|
47 |
+
|
48 |
+
4. Run the Streamlit app:
|
49 |
+
|
50 |
+
```bash
|
51 |
+
streamlit run app.py
|
52 |
+
```
|
53 |
+
|
54 |
+
## π Usage
|
55 |
+
|
56 |
+
1. Run the Streamlit app using the provided installation instructions.
|
57 |
+
2. Upload your document using the file uploader.
|
58 |
+
3. Wait for the document to be processed.
|
59 |
+
4. Interact with the document by asking questions in the chatbot interface.
|
60 |
+
|
61 |
+
## π» Technologies Used
|
62 |
+
|
63 |
+
- [Streamlit](https://streamlit.io/) - For building the interactive web application.
|
64 |
+
- [PyPDF2](https://pythonhosted.org/PyPDF2/) - For PDF document processing.
|
65 |
+
- [pdf2image](https://github.com/Belval/pdf2image) - For converting PDFs to images.
|
66 |
+
- [PyMuPDF](https://pypi.org/project/PyMuPDF/) - For PDF document rendering.
|
67 |
+
- [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) - For extracting text from images.
|
68 |
+
- [spaCy](https://spacy.io/) - For natural language processing.
|
69 |
+
- [Groq](https://github.com/groq/groq-py) - For AI-powered chatbot interaction.
|
70 |
+
- [Pillow](https://python-pillow.org/) - For image processing.
|