Carlos Salgado commited on
Commit
e4c4b43
·
unverified ·
2 Parent(s): 2755970 8c7984b

Merge pull request #1 from salgadev/merge

Browse files
Files changed (1) hide show
  1. README.md +41 -68
README.md CHANGED
@@ -1,7 +1,6 @@
1
-
2
  ---
3
  title: DocVerifyRAG
4
- emoji: 🐠
5
  colorFrom: pink
6
  colorTo: green
7
  sdk: streamlit
@@ -11,89 +10,75 @@ pinned: false
11
  ---
12
 
13
  <!-- PROJECT TITLE -->
14
- <h1 align="center">DocVerifyRAG: Document Verification and Anomaly Detection</h1>
15
  <div id="header" align="center">
16
  </div>
17
  <h2 align="center">
18
  Description
19
  </h2>
20
- <p align="center"> DocVerifyRAG is a revolutionary tool designed to streamline document verification processes in hospitals. It utilizes AI to classify documents and identify mistakes in metadata, ensuring accurate and efficient document management. Inspired by the need for improved data accuracy in healthcare, DocVerifyRAG provides automated anomaly detection to identify misclassifications and errors in document metadata, enhancing data integrity and compliance with regulatory standards. </p>
21
 
22
  ## Table of Contents
23
 
24
  <details>
25
  <summary>DocVerifyRAG</summary>
26
 
27
- - [Application Description](#application-description)
28
  - [Table of Contents](#table-of-contents)
29
- - [Local installation](#install-locally)
30
- - [Install using Docker](#install-using-docker)
31
- - [Usage](#usage)
32
- - [Contributing](#contributing)
 
 
33
  - [Authors](#authors)
34
  - [License](#license)
35
 
36
  </details>
37
 
38
  ## TRY the prototype
39
- [DocVerifyRAG](https://docverify-rag.vercel.app)
40
 
41
  ## Screenshots
42
-
43
- [Add screenshots here]
44
 
45
  ## Technology Stack
 
 
 
 
 
 
 
 
 
 
 
46
 
47
- | Technology | Description |
48
- | ---------- | --------------------------- |
49
- | AI/ML | Artificial Intelligence and Machine Learning |
50
- | Python | Programming Language |
51
- | Flask | Web Framework |
52
- | Docker | Containerization |
53
- | Tech Name | Short description |
54
 
55
  ### Features
56
 
57
- 1. **Document Classification:**
58
- - Utilizes AI/ML algorithms to classify documents based on content and metadata.
59
- - Provides accurate and efficient document categorization for improved data management.
60
 
61
- 2. **Anomaly Detection:**
62
- - Identifies mistakes and misclassifications in document metadata through automated anomaly detection.
63
- - Enhances data integrity and accuracy by flagging discrepancies in document metadata.
64
 
65
- 3. **User-Friendly Interface:**
66
- - Offers a user-friendly web interface for easy document upload, classification, and verification.
67
- - Simplifies the document management process for hospital staff, reducing manual effort and errors.
 
68
 
69
- ### Install locally
 
 
70
 
71
- #### Step 1 - Frontend
72
 
73
  1. Clone the repository:
74
  ```bash
75
- $ git clone https://github.com/eliawaefler/DocVerifyRAG.git
76
- ```
77
-
78
- 2. Navigate to the frontend directory:
79
- ```bash
80
- $ cd DocVerifyRAG/frontend
81
- ```
82
-
83
- 3. Install dependencies:
84
- ```bash
85
- $ npm install
86
- ```
87
- 4. Run project:
88
- ```bash
89
- $ npm run dev
90
- ```
91
-
92
- #### Step 2 - Backend
93
-
94
- 1. Navigate to the backend directory:
95
- ```bash
96
- $ cd DocVerifyRAG/backend
97
  ```
98
 
99
  2. Install dependencies:
@@ -101,25 +86,14 @@ pinned: false
101
  $ pip install -r requirements.txt
102
  ```
103
 
104
- ### Install using Docker
105
-
106
- To deploy DocVerifyRAG using Docker, follow these steps:
107
-
108
- 1. Pull the Docker image from Docker Hub:
109
-
110
- ```bash
111
- $ docker pull sandra/docverifyrag:latest
112
- ```
113
-
114
- 2. Run the Docker container:
115
-
116
  ```bash
117
- $ docker run -d -p 5000:5000 sandramsc/docverifyrag:latest
118
  ```
119
 
120
  ### Usage
121
 
122
- Access the web interface and follow the prompts to upload documents, classify them, and verify metadata. The AI-powered anomaly detection system will automatically flag any discrepancies or errors in the document metadata, providing accurate and reliable document management solutions for hospitals.
123
  ## Authors
124
 
125
  | Name | Link |
@@ -127,9 +101,8 @@ Access the web interface and follow the prompts to upload documents, classify th
127
  | Sandra Ashipala | [GitHub](https://github.com/sandramsc) |
128
  | Elia Wäfler | [GitHub](https://github.com/eliawaefler) |
129
  | Carlos Salgado | [GitHub](https://github.com/salgadev) |
130
- | Abdul Qadeer | [GitHub](https://github.com/AbdulQadeer-55) |
131
 
132
 
133
  ## License
134
 
135
- [![GitLicense](https://img.shields.io/badge/License-MIT-lime.svg)](https://github.com/eliawaefler/DocVerifyRAG/blob/main/LICENSE)
 
 
1
  ---
2
  title: DocVerifyRAG
3
+ emoji: 🖺
4
  colorFrom: pink
5
  colorTo: green
6
  sdk: streamlit
 
10
  ---
11
 
12
  <!-- PROJECT TITLE -->
13
+ <h1 align="center">DocVerifyRAG: Anomaly detection for BIM document metadata</h1>
14
  <div id="header" align="center">
15
  </div>
16
  <h2 align="center">
17
  Description
18
  </h2>
19
+ <p align="center"> Introducing DocVerifyRAG, a cutting-edge solution revolutionizing document verification processes across various sectors. Our app goes beyond mere document classification; it focuses on ensuring metadata accuracy by cross-referencing against a vast vector database of exemplary cases. Inspired by the necessity for precise data management, DocVerifyRAG leverages AI to scrutinize document metadata, instantly flagging anomalies and offering suggested corrections. Powered by Vectara vector store technology and supported by the innovative capabilities of together.ai API, our app employs advanced anomaly detection algorithms to scrutinize metadata, ensuring compliance with regulatory standards and enhancing data integrity. With DocVerifyRAG, users can effortlessly verify document metadata accuracy, minimizing errors and streamlining operational efficiency.</p>
20
 
21
  ## Table of Contents
22
 
23
  <details>
24
  <summary>DocVerifyRAG</summary>
25
 
 
26
  - [Table of Contents](#table-of-contents)
27
+ - [TRY the prototype](#try-the-prototype)
28
+ - [Screenshots](#screenshots)
29
+ - [Technology Stack](#technology-stack)
30
+ - [Features](#features)
31
+ - [Install locally](#install-locally)
32
+ - [Usage](#usage)
33
  - [Authors](#authors)
34
  - [License](#license)
35
 
36
  </details>
37
 
38
  ## TRY the prototype
39
+ [DocVerifyRAG](https://docverifyrag.vercel.app/)
40
 
41
  ## Screenshots
42
+ ![ttthh](https://github.com/eliawaefler/DocVerifyRAG/assets/19821445/331845d7-a360-4315-92ef-d4bb50021eaa)
 
43
 
44
  ## Technology Stack
45
+ | Technology | Description |
46
+ | --- | --- |
47
+ | **Python** | Primary programming language used for development. |
48
+ | **LangChain** | Framework for developing applications powered by large language models (LLMs). |
49
+ | **Vectara** | Provides efficient vector search capabilities via the Boomerang model in a "RAG as a service" architecture. |
50
+ | **intfloat/multilingual-e5-large** | Generates efficient and performant multilingual language embeddings. |
51
+ | **Together AI** | Platform for training, fine-tuning, and deploying gen AI models. Its inference API was used with the model `mistralai/Mixtral-8x7B-Instruct-v0.1`. |
52
+ | **Streamlit** | Open-source Python library for creating custom web apps, used as the frontend. |
53
+ | **Hugging Face Spaces** | Service for developer-friendly deployments of data applications. |
54
+
55
+ The backend is built using Python, LangChain, Vectara, and Together AI's inference API with the `mistralai/Mixtral-8x7B-Instruct-v0.1` model for processing and understanding large amounts of data. Streamlit is used for the frontend, providing an intuitive interface for users. Hugging Face Spaces simplifies the deployment process, making the application easily accessible.
56
 
 
 
 
 
 
 
 
57
 
58
  ### Features
59
 
60
+ 1. **Metadata Verification:**
61
+ - Cross-references document metadata against a comprehensive vector database of exemplary cases.
62
+ - Instantly identifies anomalies and discrepancies, ensuring metadata accuracy and compliance.
63
 
64
+ 2. **Automated Metadata Correction:**
65
+ - Offers suggested metadata corrections based on processed PDF files, facilitating swift and accurate adjustments.
66
+ - Potential for automated inspection of numerous metadata rows for seamless large-scale data verification.
67
 
68
+ 3. **Question Answering Retriever:**
69
+ - Utilizes Vectara vector store technology for efficient retrieval of relevant information.
70
+ - Employs Hugging Face embeddings E5 multilingual model for precise analysis of multilingual data.
71
+ - Identifies anomalies in names, descriptions, and disciplines, providing actionable insights for data accuracy.
72
 
73
+ 4. **User-Friendly Interface:**
74
+ - Intuitive web interface for effortless document upload, metadata verification, and correction.
75
+ - Simplifies document management processes, reducing manual effort and enhancing operational efficiency.
76
 
77
+ ### Install locally
78
 
79
  1. Clone the repository:
80
  ```bash
81
+ $ git clone https://github.com/salgadev/DocVerifyRAG.git
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ```
83
 
84
  2. Install dependencies:
 
86
  $ pip install -r requirements.txt
87
  ```
88
 
89
+ 3. Run using Streamlit:
 
 
 
 
 
 
 
 
 
 
 
90
  ```bash
91
+ $ streamlit run app.py
92
  ```
93
 
94
  ### Usage
95
 
96
+ Access the web interface and follow the prompts to upload documents, classify them, and verify metadata. The AI-powered anomaly detection system will automatically flag any discrepancies or errors in the document metadata, providing accurate and reliable document management solutions.
97
  ## Authors
98
 
99
  | Name | Link |
 
101
  | Sandra Ashipala | [GitHub](https://github.com/sandramsc) |
102
  | Elia Wäfler | [GitHub](https://github.com/eliawaefler) |
103
  | Carlos Salgado | [GitHub](https://github.com/salgadev) |
 
104
 
105
 
106
  ## License
107
 
108
+ [![GitLicense](https://img.shields.io/badge/License-MIT-lime.svg)](https://github.com/eliawaefler/DocVerifyRAG/blob/main/LICENSE)