talexm commited on
Commit
e893d68
1 Parent(s): 0c3cda8

adding blockchain logger

Browse files
rag_sec/README.md CHANGED
@@ -1,13 +1,43 @@
1
- ## Workflow
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  The system follows a well-structured workflow to ensure accurate, secure, and context-aware responses to user queries:
4
 
5
- ### 1. **Input Query**
6
  - A user provides a query that can be a general question, ambiguous statement, or potentially malicious intent.
7
 
8
  ---
9
 
10
- ### 2. **Detection Module**
11
  - **Purpose**: Classify the query as "bad" or "good."
12
  - **Steps**:
13
  1. Use a sentiment analysis model (`distilbert-base-uncased-finetuned-sst-2-english`) to detect malicious or inappropriate intent.
@@ -16,7 +46,7 @@ The system follows a well-structured workflow to ensure accurate, secure, and co
16
 
17
  ---
18
 
19
- ### 3. **Transformation Module**
20
  - **Purpose**: Rephrase or enhance ambiguous or poorly structured queries for better retrieval.
21
  - **Steps**:
22
  1. Identify missing context or ambiguous phrasing.
@@ -27,7 +57,7 @@ The system follows a well-structured workflow to ensure accurate, secure, and co
27
 
28
  ---
29
 
30
- ### 4. **RAG Pipeline**
31
  - **Purpose**: Retrieve relevant data and generate a context-aware response.
32
  - **Steps**:
33
  1. **Document Retrieval**:
@@ -40,7 +70,7 @@ The system follows a well-structured workflow to ensure accurate, secure, and co
40
 
41
  ---
42
 
43
- ### 5. **Semantic Response Generation**
44
  - **Purpose**: Provide a concise and meaningful answer.
45
  - **Steps**:
46
  1. Combine the retrieved documents into a coherent context.
@@ -49,9 +79,219 @@ The system follows a well-structured workflow to ensure accurate, secure, and co
49
 
50
  ---
51
 
52
- ### End-to-End Example
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
- #### Input Query:
55
- ```plaintext
56
- "How to improve acting skills?"
57
- ````
 
1
+ # **Document Search System**
2
+
3
+ ## **Overview**
4
+ The **Document Search System** provides context-aware and secure responses to user queries by combining query analysis, document retrieval, semantic response generation, and blockchain-powered logging. The system also integrates Neo4j for storing and visualizing relationships between queries, documents, and responses.
5
+
6
+ ---
7
+
8
+ ## **Features**
9
+ 1. **Query Classification:**
10
+ - Detects malicious or inappropriate queries using a sentiment analysis model.
11
+ - Blocks malicious queries and prevents them from further processing.
12
+
13
+ 2. **Query Transformation:**
14
+ - Rephrases or enhances ambiguous queries to improve retrieval accuracy.
15
+ - Uses rule-based transformations and advanced text-to-text models.
16
+
17
+ 3. **RAG Pipeline:**
18
+ - Retrieves top-k documents based on semantic similarity.
19
+ - Generates context-aware responses using generative models.
20
+
21
+ 4. **Blockchain Integration (Chagu):**
22
+ - Logs all stages of query processing into a blockchain for integrity and traceability.
23
+ - Validates blockchain integrity.
24
+
25
+ 5. **Neo4j Integration:**
26
+ - Stores and visualizes relationships between queries, responses, and documents.
27
+ - Allows detailed querying and visualization of the data flow.
28
+
29
+ ---
30
+
31
+ ## **Workflow**
32
 
33
  The system follows a well-structured workflow to ensure accurate, secure, and context-aware responses to user queries:
34
 
35
+ ### **1. Input Query**
36
  - A user provides a query that can be a general question, ambiguous statement, or potentially malicious intent.
37
 
38
  ---
39
 
40
+ ### **2. Detection Module**
41
  - **Purpose**: Classify the query as "bad" or "good."
42
  - **Steps**:
43
  1. Use a sentiment analysis model (`distilbert-base-uncased-finetuned-sst-2-english`) to detect malicious or inappropriate intent.
 
46
 
47
  ---
48
 
49
+ ### **3. Transformation Module**
50
  - **Purpose**: Rephrase or enhance ambiguous or poorly structured queries for better retrieval.
51
  - **Steps**:
52
  1. Identify missing context or ambiguous phrasing.
 
57
 
58
  ---
59
 
60
+ ### **4. RAG Pipeline**
61
  - **Purpose**: Retrieve relevant data and generate a context-aware response.
62
  - **Steps**:
63
  1. **Document Retrieval**:
 
70
 
71
  ---
72
 
73
+ ### **5. Semantic Response Generation**
74
  - **Purpose**: Provide a concise and meaningful answer.
75
  - **Steps**:
76
  1. Combine the retrieved documents into a coherent context.
 
79
 
80
  ---
81
 
82
+ ### **6. Logging and Storage**
83
+ - **Blockchain Logging:**
84
+ - Each query, transformed query, response, and document retrieval stage is logged into the blockchain for traceability.
85
+ - Ensures data integrity and tamper-proof records.
86
+ - **Neo4j Storage:**
87
+ - Relationships between queries, responses, and retrieved documents are stored in Neo4j.
88
+ - Enables detailed analysis and graph-based visualization.
89
+
90
+ ---
91
+
92
+ ## **Neo4j Visualization**
93
+
94
+ Here is an example of how the relationships between queries, responses, and documents appear in Neo4j:
95
+
96
+ ![Neo4j Visualization](../../screenshots/Screenshot_from_2024-11-30_19-01-31.png)
97
+
98
+ - **Nodes**:
99
+ - Query: Represents the user query.
100
+ - TransformedQuery: Rephrased or improved query.
101
+ - Document: Relevant documents retrieved based on the query.
102
+ - Response: The generated response.
103
+
104
+ - **Relationships**:
105
+ - `RETRIEVED`: Links the query to retrieved documents.
106
+ - `TRANSFORMED_TO`: Links the original query to the transformed query.
107
+ - `GENERATED`: Links the query to the generated response.
108
+
109
+ ---
110
+
111
+ ## **Setup Instructions**
112
+ 1. Clone the repository:
113
+ ```bash
114
+ git clone https://github.com/your-repo/document-search-system.git
115
+ ```
116
+
117
+ Here’s the updated README.md content in proper Markdown format with the embedded image reference:
118
+
119
+ markdown
120
+
121
+ # **Document Search System**
122
+
123
+ ## **Overview**
124
+ The **Document Search System** provides context-aware and secure responses to user queries by combining query analysis, document retrieval, semantic response generation, and blockchain-powered logging. The system also integrates Neo4j for storing and visualizing relationships between queries, documents, and responses.
125
+
126
+ ---
127
+
128
+ ## **Features**
129
+ 1. **Query Classification:**
130
+ - Detects malicious or inappropriate queries using a sentiment analysis model.
131
+ - Blocks malicious queries and prevents them from further processing.
132
+
133
+ 2. **Query Transformation:**
134
+ - Rephrases or enhances ambiguous queries to improve retrieval accuracy.
135
+ - Uses rule-based transformations and advanced text-to-text models.
136
+
137
+ 3. **RAG Pipeline:**
138
+ - Retrieves top-k documents based on semantic similarity.
139
+ - Generates context-aware responses using generative models.
140
+
141
+ 4. **Blockchain Integration (Chagu):**
142
+ - Logs all stages of query processing into a blockchain for integrity and traceability.
143
+ - Validates blockchain integrity.
144
+
145
+ 5. **Neo4j Integration:**
146
+ - Stores and visualizes relationships between queries, responses, and documents.
147
+ - Allows detailed querying and visualization of the data flow.
148
+
149
+ ---
150
+
151
+ ## **Workflow**
152
+
153
+ The system follows a well-structured workflow to ensure accurate, secure, and context-aware responses to user queries:
154
+
155
+ ### **1. Input Query**
156
+ - A user provides a query that can be a general question, ambiguous statement, or potentially malicious intent.
157
+
158
+ ---
159
+
160
+ ### **2. Detection Module**
161
+ - **Purpose**: Classify the query as "bad" or "good."
162
+ - **Steps**:
163
+ 1. Use a sentiment analysis model (`distilbert-base-uncased-finetuned-sst-2-english`) to detect malicious or inappropriate intent.
164
+ 2. If the query is classified as "bad" (e.g., SQL injection or inappropriate tone), block further processing and provide a warning message.
165
+ 3. If "good," proceed to the **Transformation Module**.
166
+
167
+ ---
168
+
169
+ ### **3. Transformation Module**
170
+ - **Purpose**: Rephrase or enhance ambiguous or poorly structured queries for better retrieval.
171
+ - **Steps**:
172
+ 1. Identify missing context or ambiguous phrasing.
173
+ 2. Transform the query using:
174
+ - Rule-based transformations for simple fixes.
175
+ - Text-to-text models (e.g., `google/flan-t5-small`) for more sophisticated rephrasing.
176
+ 3. Pass the transformed query to the **RAG Pipeline**.
177
+
178
+ ---
179
+
180
+ ### **4. RAG Pipeline**
181
+ - **Purpose**: Retrieve relevant data and generate a context-aware response.
182
+ - **Steps**:
183
+ 1. **Document Retrieval**:
184
+ - Encode the transformed query and documents into embeddings using `all-MiniLM-L6-v2`.
185
+ - Compute semantic similarity between the query and stored documents.
186
+ - Retrieve the top-k documents relevant to the query.
187
+ 2. **Response Generation**:
188
+ - Use the retrieved documents as context.
189
+ - Pass the query and context to a generative model (e.g., `distilgpt2`) to synthesize a meaningful response.
190
+
191
+ ---
192
+
193
+ ### **5. Semantic Response Generation**
194
+ - **Purpose**: Provide a concise and meaningful answer.
195
+ - **Steps**:
196
+ 1. Combine the retrieved documents into a coherent context.
197
+ 2. Generate a response tailored to the query using the generative model.
198
+ 3. Return the response to the user, ensuring clarity and relevance.
199
+
200
+ ---
201
+
202
+ ### **6. Logging and Storage**
203
+ - **Blockchain Logging:**
204
+ - Each query, transformed query, response, and document retrieval stage is logged into the blockchain for traceability.
205
+ - Ensures data integrity and tamper-proof records.
206
+ - **Neo4j Storage:**
207
+ - Relationships between queries, responses, and retrieved documents are stored in Neo4j.
208
+ - Enables detailed analysis and graph-based visualization.
209
+
210
+ ---
211
+
212
+ ## **Neo4j Visualization**
213
+
214
+ Here is an example of how the relationships between queries, responses, and documents appear in Neo4j:
215
+
216
+ ![Neo4j Visualization](./path/to/Screenshot_from_2024-11-30_19-01-31.png)
217
+
218
+ - **Nodes**:
219
+ - Query: Represents the user query.
220
+ - TransformedQuery: Rephrased or improved query.
221
+ - Document: Relevant documents retrieved based on the query.
222
+ - Response: The generated response.
223
+
224
+ - **Relationships**:
225
+ - `RETRIEVED`: Links the query to retrieved documents.
226
+ - `TRANSFORMED_TO`: Links the original query to the transformed query.
227
+ - `GENERATED`: Links the query to the generated response.
228
+
229
+ ---
230
+
231
+ ## **Setup Instructions**
232
+ 1. Clone the repository:
233
+ ```bash
234
+ git clone https://github.com/your-repo/document-search-system.git
235
+ ```
236
+ Install dependencies:
237
+
238
+ ```bash
239
+
240
+ pip install -r requirements.txt
241
+ ```
242
+ Initialize the Neo4j database:
243
+
244
+ Connect to your Neo4j Aura instance.
245
+ Set up credentials in the code.
246
+ Load the dataset:
247
+
248
+ Place your documents in the dataset directory (e.g., data-sets/aclImdb/train).
249
+ Run the system:
250
+
251
+ ```bash
252
+
253
+ python document_search_system.py
254
+ ```
255
+ Neo4j Queries
256
+ Retrieve All Queries Logged
257
+ ```cypher
258
+
259
+ MATCH (q:Query)
260
+ RETURN q.text AS query, q.timestamp AS timestamp
261
+ ORDER BY timestamp DESC
262
+ ```
263
+
264
+ Visualize Query Relationships
265
+ ```cypher
266
+
267
+ MATCH (n)-[r]->(m)
268
+ RETURN n, r, m
269
+ Find Documents for a Query
270
+
271
+ ```
272
+
273
+ ```cypher
274
+
275
+ MATCH (q:Query {text: "How to improve acting skills?"})-[:RETRIEVED]->(d:Document)
276
+ RETURN d.name AS document_name
277
+ ```
278
+
279
+ ### Key Technologies
280
+ Machine Learning Models:
281
+ distilbert-base-uncased-finetuned-sst-2-english for sentiment analysis.
282
+ google/flan-t5-small for query transformation.
283
+ distilgpt2 for response generation.
284
+ Vector Similarity Search:
285
+ all-MiniLM-L6-v2 embeddings for document retrieval.
286
+ Blockchain Logging:
287
+ Powered by chainguard.blockchain_logger.
288
+ Graph-Based Storage:
289
+ Relationships visualized and queried via Neo4j.
290
+ vbnet
291
+
292
+
293
+
294
+
295
+
296
+
297
 
 
 
 
 
rag_sec/backup.py ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from pathlib import Path
3
+
4
+ from .bad_query_detector import BadQueryDetector
5
+ from .query_transformer import QueryTransformer
6
+ from .document_retriver import DocumentRetriever
7
+ from .senamtic_response_generator import SemanticResponseGenerator
8
+
9
+
10
+ class DocumentSearchSystem:
11
+ def __init__(self):
12
+ """
13
+ Initializes the DocumentSearchSystem with:
14
+ - BadQueryDetector for identifying malicious or inappropriate queries.
15
+ - QueryTransformer for improving or rephrasing queries.
16
+ - DocumentRetriever for semantic document retrieval.
17
+ - SemanticResponseGenerator for generating context-aware responses.
18
+ """
19
+ self.detector = BadQueryDetector()
20
+ self.transformer = QueryTransformer()
21
+ self.retriever = DocumentRetriever()
22
+ self.response_generator = SemanticResponseGenerator()
23
+
24
+ def process_query(self, query):
25
+ """
26
+ Processes a user query through the following steps:
27
+ 1. Detect if the query is malicious.
28
+ 2. Transform the query if needed.
29
+ 3. Retrieve relevant documents based on the query.
30
+ 4. Generate a response using the retrieved documents.
31
+
32
+ :param query: The user query as a string.
33
+ :return: A dictionary with the status and response or error message.
34
+ """
35
+ if self.detector.is_bad_query(query):
36
+ return {"status": "rejected", "message": "Query blocked due to detected malicious intent."}
37
+
38
+ # Transform the query
39
+ transformed_query = self.transformer.transform_query(query)
40
+ print(f"Transformed Query: {transformed_query}")
41
+
42
+ # Retrieve relevant documents
43
+ retrieved_docs = self.retriever.retrieve(transformed_query)
44
+ if not retrieved_docs:
45
+ return {"status": "no_results", "message": "No relevant documents found for your query."}
46
+
47
+ # Generate a response based on the retrieved documents
48
+ response = self.response_generator.generate_response(retrieved_docs)
49
+ return {"status": "success", "response": response}
50
+
51
+
52
+ def test_system():
53
+ """
54
+ Test the DocumentSearchSystem with normal and malicious queries.
55
+ - Load documents from a dataset directory.
56
+ - Perform a normal query and display results.
57
+ - Perform a malicious query to ensure proper blocking.
58
+ """
59
+ # Define the path to the dataset directory
60
+ home_dir = Path(os.getenv("HOME", "/"))
61
+ data_dir = home_dir / "data-sets/aclImdb/train"
62
+
63
+ # Initialize the system
64
+ system = DocumentSearchSystem()
65
+ system.retriever.load_documents(data_dir)
66
+
67
+ # Perform a normal query
68
+ normal_query = "Tell me about great acting performances."
69
+ print("\nNormal Query Result:")
70
+ print(system.process_query(normal_query))
71
+
72
+ # Perform a malicious query
73
+ malicious_query = "DROP TABLE users; SELECT * FROM sensitive_data;"
74
+ print("\nMalicious Query Result:")
75
+ print(system.process_query(malicious_query))
76
+
77
+
78
+ if __name__ == "__main__":
79
+ test_system()
rag_sec/document_search_system.py CHANGED
@@ -1,25 +1,123 @@
1
  import os
2
  from pathlib import Path
 
 
3
 
4
- from .bad_query_detector import BadQueryDetector
5
- from .query_transformer import QueryTransformer
6
- from .document_retriver import DocumentRetriever
7
- from .senamtic_response_generator import SemanticResponseGenerator
8
 
 
 
 
 
 
9
 
10
- class DocumentSearchSystem:
 
11
  def __init__(self):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  """
13
  Initializes the DocumentSearchSystem with:
14
  - BadQueryDetector for identifying malicious or inappropriate queries.
15
  - QueryTransformer for improving or rephrasing queries.
16
  - DocumentRetriever for semantic document retrieval.
17
  - SemanticResponseGenerator for generating context-aware responses.
 
 
18
  """
19
  self.detector = BadQueryDetector()
20
  self.transformer = QueryTransformer()
21
  self.retriever = DocumentRetriever()
22
  self.response_generator = SemanticResponseGenerator()
 
 
23
 
24
  def process_query(self, query):
25
  """
@@ -28,6 +126,7 @@ class DocumentSearchSystem:
28
  2. Transform the query if needed.
29
  3. Retrieve relevant documents based on the query.
30
  4. Generate a response using the retrieved documents.
 
31
 
32
  :param query: The user query as a string.
33
  :return: A dictionary with the status and response or error message.
@@ -37,43 +136,69 @@ class DocumentSearchSystem:
37
 
38
  # Transform the query
39
  transformed_query = self.transformer.transform_query(query)
40
- print(f"Transformed Query: {transformed_query}")
 
 
41
 
42
  # Retrieve relevant documents
43
  retrieved_docs = self.retriever.retrieve(transformed_query)
44
  if not retrieved_docs:
45
  return {"status": "no_results", "message": "No relevant documents found for your query."}
46
 
 
 
 
47
  # Generate a response based on the retrieved documents
48
  response = self.response_generator.generate_response(retrieved_docs)
49
- return {"status": "success", "response": response}
50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
- def test_system():
53
- """
54
- Test the DocumentSearchSystem with normal and malicious queries.
55
- - Load documents from a dataset directory.
56
- - Perform a normal query and display results.
57
- - Perform a malicious query to ensure proper blocking.
58
- """
59
- # Define the path to the dataset directory
60
  home_dir = Path(os.getenv("HOME", "/"))
61
  data_dir = home_dir / "data-sets/aclImdb/train"
62
 
63
- # Initialize the system
64
- system = DocumentSearchSystem()
65
- system.retriever.load_documents(data_dir)
66
 
 
 
 
 
 
 
 
 
67
  # Perform a normal query
68
  normal_query = "Tell me about great acting performances."
69
  print("\nNormal Query Result:")
70
- print(system.process_query(normal_query))
 
 
 
 
71
 
72
  # Perform a malicious query
73
  malicious_query = "DROP TABLE users; SELECT * FROM sensitive_data;"
74
  print("\nMalicious Query Result:")
75
- print(system.process_query(malicious_query))
 
 
76
 
77
 
78
- if __name__ == "__main__":
79
- test_system()
 
1
  import os
2
  from pathlib import Path
3
+ from chainguard.blockchain_logger import BlockchainLogger
4
+ from neo4j import GraphDatabase
5
 
6
+ import sys
7
+ from os import path
 
 
8
 
9
+ sys.path.append(path.dirname(path.dirname(path.abspath(__file__))))
10
+ from bad_query_detector import BadQueryDetector
11
+ from query_transformer import QueryTransformer
12
+ from document_retriver import DocumentRetriever
13
+ from senamtic_response_generator import SemanticResponseGenerator
14
 
15
+
16
+ class DataTransformer:
17
  def __init__(self):
18
+ """
19
+ Initializes a DataTransformer with a blockchain logger instance.
20
+ """
21
+ self.blockchain_logger = BlockchainLogger()
22
+
23
+ def secure_transform(self, data):
24
+ """
25
+ Securely transforms the input data by logging it into the blockchain.
26
+
27
+ Args:
28
+ data (dict): The log data or any data to be securely transformed.
29
+
30
+ Returns:
31
+ dict: A dictionary containing the original data, block hash, and blockchain length.
32
+ """
33
+ # Log the data into the blockchain
34
+ block_details = self.blockchain_logger.log_data(data)
35
+
36
+ # Return the block details and blockchain status
37
+ return {
38
+ "data": data,
39
+ **block_details
40
+ }
41
+
42
+ def validate_blockchain(self):
43
+ """
44
+ Validates the integrity of the blockchain.
45
+
46
+ Returns:
47
+ bool: True if the blockchain is valid, False otherwise.
48
+ """
49
+ return self.blockchain_logger.is_blockchain_valid()
50
+
51
+
52
+ class Neo4jHandler:
53
+ def __init__(self, uri, user, password):
54
+ """
55
+ Initializes a Neo4j handler for storing and querying relationships.
56
+ """
57
+ self.driver = GraphDatabase.driver(uri, auth=(user, password))
58
+
59
+ def close(self):
60
+ self.driver.close()
61
+
62
+ def log_relationships(self, query, transformed_query, response, documents):
63
+ """
64
+ Logs the relationships between queries, responses, and documents into Neo4j.
65
+ """
66
+ with self.driver.session() as session:
67
+ session.write_transaction(self._create_and_link_nodes, query, transformed_query, response, documents)
68
+
69
+ @staticmethod
70
+ def _create_and_link_nodes(tx, query, transformed_query, response, documents):
71
+ # Create Query node
72
+ tx.run("MERGE (q:Query {text: $query}) RETURN q", parameters={"query": query})
73
+ # Create TransformedQuery node
74
+ tx.run("MERGE (t:TransformedQuery {text: $transformed_query}) RETURN t",
75
+ parameters={"transformed_query": transformed_query})
76
+ # Create Response node
77
+ tx.run("MERGE (r:Response {text: $response}) RETURN r", parameters={"response": response})
78
+
79
+ # Link Query to TransformedQuery and Response
80
+ tx.run(
81
+ """
82
+ MATCH (q:Query {text: $query}), (t:TransformedQuery {text: $transformed_query})
83
+ MERGE (q)-[:TRANSFORMED_TO]->(t)
84
+ """, parameters={"query": query, "transformed_query": transformed_query}
85
+ )
86
+ tx.run(
87
+ """
88
+ MATCH (q:Query {text: $query}), (r:Response {text: $response})
89
+ MERGE (q)-[:GENERATED]->(r)
90
+ """, parameters={"query": query, "response": response}
91
+ )
92
+
93
+ # Create and link Document nodes
94
+ for doc in documents:
95
+ tx.run("MERGE (d:Document {name: $doc}) RETURN d", parameters={"doc": doc})
96
+ tx.run(
97
+ """
98
+ MATCH (q:Query {text: $query}), (d:Document {name: $doc})
99
+ MERGE (q)-[:RETRIEVED]->(d)
100
+ """, parameters={"query": query, "doc": doc}
101
+ )
102
+
103
+
104
+ class DocumentSearchSystem:
105
+ def __init__(self, neo4j_uri, neo4j_user, neo4j_password):
106
  """
107
  Initializes the DocumentSearchSystem with:
108
  - BadQueryDetector for identifying malicious or inappropriate queries.
109
  - QueryTransformer for improving or rephrasing queries.
110
  - DocumentRetriever for semantic document retrieval.
111
  - SemanticResponseGenerator for generating context-aware responses.
112
+ - DataTransformer for blockchain logging of queries and responses.
113
+ - Neo4jHandler for relationship logging and visualization.
114
  """
115
  self.detector = BadQueryDetector()
116
  self.transformer = QueryTransformer()
117
  self.retriever = DocumentRetriever()
118
  self.response_generator = SemanticResponseGenerator()
119
+ self.data_transformer = DataTransformer()
120
+ self.neo4j_handler = Neo4jHandler(neo4j_uri, neo4j_user, neo4j_password)
121
 
122
  def process_query(self, query):
123
  """
 
126
  2. Transform the query if needed.
127
  3. Retrieve relevant documents based on the query.
128
  4. Generate a response using the retrieved documents.
129
+ 5. Log all stages to the blockchain and Neo4j.
130
 
131
  :param query: The user query as a string.
132
  :return: A dictionary with the status and response or error message.
 
136
 
137
  # Transform the query
138
  transformed_query = self.transformer.transform_query(query)
139
+
140
+ # Log the original query to the blockchain
141
+ self.data_transformer.secure_transform({"type": "query", "content": query})
142
 
143
  # Retrieve relevant documents
144
  retrieved_docs = self.retriever.retrieve(transformed_query)
145
  if not retrieved_docs:
146
  return {"status": "no_results", "message": "No relevant documents found for your query."}
147
 
148
+ # Log the retrieved documents to the blockchain
149
+ self.data_transformer.secure_transform({"type": "documents", "content": retrieved_docs})
150
+
151
  # Generate a response based on the retrieved documents
152
  response = self.response_generator.generate_response(retrieved_docs)
 
153
 
154
+ # Log the response to the blockchain
155
+ blockchain_details = self.data_transformer.secure_transform({"type": "response", "content": response})
156
+
157
+ # Log relationships to Neo4j
158
+ self.neo4j_handler.log_relationships(query, transformed_query, response, retrieved_docs)
159
+
160
+ return {
161
+ "status": "success",
162
+ "response": response,
163
+ "retrieved_documents": retrieved_docs,
164
+ "blockchain_details": blockchain_details
165
+ }
166
+
167
+ def validate_system_integrity(self):
168
+ """
169
+ Validates the integrity of the blockchain.
170
+ """
171
+ return self.data_transformer.validate_blockchain()
172
+
173
+
174
+ if __name__ == "__main__":
175
 
 
 
 
 
 
 
 
 
176
  home_dir = Path(os.getenv("HOME", "/"))
177
  data_dir = home_dir / "data-sets/aclImdb/train"
178
 
 
 
 
179
 
180
+ # Initialize system with Neo4j credentials
181
+ system = DocumentSearchSystem(
182
+ neo4j_uri="neo4j+s://0ca71b10.databases.neo4j.io",
183
+ neo4j_user="neo4j",
184
+ neo4j_password="<PINGME ill provide>"
185
+ )
186
+
187
+ system.retriever.load_documents(data_dir)
188
  # Perform a normal query
189
  normal_query = "Tell me about great acting performances."
190
  print("\nNormal Query Result:")
191
+ result = system.process_query(normal_query)
192
+ print("Status:", result["status"])
193
+ print("Response:", result["response"])
194
+ print("Retrieved Documents:", result["retrieved_documents"])
195
+ print("Blockchain Details:", result["blockchain_details"])
196
 
197
  # Perform a malicious query
198
  malicious_query = "DROP TABLE users; SELECT * FROM sensitive_data;"
199
  print("\nMalicious Query Result:")
200
+ result = system.process_query(malicious_query)
201
+ print("Status:", result["status"])
202
+ print("Message:", result.get("message"))
203
 
204
 
 
 
screenshots/Screenshot from 2024-11-30 19-01-31.png ADDED