Spaces:
Sleeping
Sleeping
JarvisChan630
commited on
Commit
•
41b582a
1
Parent(s):
67b3290
readme
Browse files- README.md +24 -28
- tools/legacy/offline_graph_rag_tool copy.py +12 -2
- tools/offline_graph_rag_tool.py +10 -3
README.md
CHANGED
@@ -1,46 +1,47 @@
|
|
1 |
-
#
|
2 |
|
3 |
A project for versatile AI agents that can run with proprietary models or completely open-source. The meta expert has two agents: a basic [Meta Agent](Docs/Meta-Prompting%20Overview.MD), and [Jar3d](Docs/Introduction%20to%20Jar3d.MD), a more sophisticated and versatile agent.
|
4 |
|
5 |
-
Act as an
|
6 |
|
7 |
Thanks John Adeojo, who brings this wonderful project to open source community!
|
8 |
|
9 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
-
##
|
12 |
-
What is the logics?
|
13 |
|
14 |
-
|
|
|
15 |
1. User Query: The user initiates the interaction by submitting a query or request for information.
|
16 |
2. Agent Accesses the Internet: The agent retrieves relevant information from various online sources, such as web pages, articles, and databases.
|
17 |
-
3. Document Chunking: The retrieved URLs are processed to break down the content into smaller, manageable documents or chunks. This step ensures that the information is more digestible and can be analyzed effectively.
|
18 |
4. Vectorization: Each document chunk is then transformed into a multi-dimensional embedding using vectorization techniques. This process captures the semantic meaning of the text, allowing for nuanced comparisons between different pieces of information.
|
19 |
5. Similarity Search: A similarity search is performed using cosine similarity (or another appropriate metric) to identify and rank the most relevant document chunks in relation to the original user query. This step helps in finding the closest matches based on the embeddings generated earlier.
|
20 |
6. Response Generation: Finally, the most relevant chunks are selected, and the LLM synthesizes them into a coherent response that directly addresses the user's query.
|
21 |
|
22 |
## Bullet points
|
23 |
-
- By implemented RAG, Chain-of-Reasoning, and Meta-Prompting to complete long-running research tasks.
|
24 |
-
|
25 |
-
- Neo4j Knowledge Graphs
|
26 |
-
-Why use this?
|
27 |
-
naive RAG:
|
28 |
-
![naive](image.png)
|
29 |
-
Complex:
|
30 |
-
![why need graph](assets/image.png)
|
31 |
-
|
32 |
|
33 |
-
- Docker for backend
|
34 |
-
|
35 |
-
- NLM-Ingestor - llmsherpa API - Chunk data
|
36 |
|
|
|
|
|
37 |
|
|
|
38 |
|
39 |
|
40 |
-
## FAQ
|
41 |
-
1. Is it necessary for a recursion more than 30 rounds? Is it spending money too much?
|
42 |
-
|
43 |
|
|
|
|
|
44 |
|
45 |
## Table of Contents
|
46 |
|
@@ -115,6 +116,7 @@ This project leverages four core concepts:
|
|
115 |
nano config/config.yaml
|
116 |
```
|
117 |
|
|
|
118 |
### API Key Configuration
|
119 |
|
120 |
Enter API Keys for your choice of LLM provider:
|
@@ -222,9 +224,3 @@ Refer to the project's GitHub issues for common problems and solutions.
|
|
222 |
|
223 |
Once you're set up, Jar3d will proceed to introduce itself and ask some questions. The questions are designed to help you refine your requirements. When you feel you have provided all the relevant information to Jar3d, you can end the questioning part of the workflow by typing `/end`.
|
224 |
|
225 |
-
## Roadmap for Jar3d
|
226 |
-
|
227 |
-
- Feedback to Jar3d so that final responses can be iterated on and amended.
|
228 |
-
- Long-term memory.
|
229 |
-
- Full Ollama and vLLM integration.
|
230 |
-
- Integrations to RAG platforms for more intelligent document processing and faster RAG.
|
|
|
1 |
+
# Super Expert
|
2 |
|
3 |
A project for versatile AI agents that can run with proprietary models or completely open-source. The meta expert has two agents: a basic [Meta Agent](Docs/Meta-Prompting%20Overview.MD), and [Jar3d](Docs/Introduction%20to%20Jar3d.MD), a more sophisticated and versatile agent.
|
4 |
|
5 |
+
Act as an open source perplexity.
|
6 |
|
7 |
Thanks John Adeojo, who brings this wonderful project to open source community!
|
8 |
|
9 |
+
## Tech Stack
|
10 |
+
- LLM(openai, claude, llama)
|
11 |
+
- Frontend(Chainlit - chain of thought reasoning)
|
12 |
+
- Backend
|
13 |
+
- python
|
14 |
+
- docker
|
15 |
+
- Hugging Face deploy
|
16 |
+
|
17 |
+
## TODO
|
18 |
+
[] Long-term memory.
|
19 |
+
[] Full Ollama and vLLM integration.
|
20 |
+
[] Integrations to RAG platforms for more intelligent document processing and faster RAG.
|
21 |
|
22 |
+
## PMF - What problem this project has solved?
|
|
|
23 |
|
24 |
+
## Business Logics
|
25 |
+
### LLM Application Workflow
|
26 |
1. User Query: The user initiates the interaction by submitting a query or request for information.
|
27 |
2. Agent Accesses the Internet: The agent retrieves relevant information from various online sources, such as web pages, articles, and databases.
|
28 |
+
3. Document Chunking: The retrieved URLs are processed to break down the content into smaller, manageable documents or chunks. This step ensures that the information is more digestible and can be analyzed effectively.(tools\legacy\offline_graph_rag_tool copy.py run_rag)
|
29 |
4. Vectorization: Each document chunk is then transformed into a multi-dimensional embedding using vectorization techniques. This process captures the semantic meaning of the text, allowing for nuanced comparisons between different pieces of information.
|
30 |
5. Similarity Search: A similarity search is performed using cosine similarity (or another appropriate metric) to identify and rank the most relevant document chunks in relation to the original user query. This step helps in finding the closest matches based on the embeddings generated earlier.
|
31 |
6. Response Generation: Finally, the most relevant chunks are selected, and the LLM synthesizes them into a coherent response that directly addresses the user's query.
|
32 |
|
33 |
## Bullet points
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
|
|
|
|
|
|
35 |
|
36 |
+
## FAQ
|
37 |
+
1. How this system work?
|
38 |
|
39 |
+
2.
|
40 |
|
41 |
|
|
|
|
|
|
|
42 |
|
43 |
+
2. How hybrid-retrieval work?
|
44 |
+
In `offline_graph_rag` file, we combine similarity search with
|
45 |
|
46 |
## Table of Contents
|
47 |
|
|
|
116 |
nano config/config.yaml
|
117 |
```
|
118 |
|
119 |
+
If you want to use hyhbrid search, please open settings and choose "Graph and Dense".
|
120 |
### API Key Configuration
|
121 |
|
122 |
Enter API Keys for your choice of LLM provider:
|
|
|
224 |
|
225 |
Once you're set up, Jar3d will proceed to introduce itself and ask some questions. The questions are designed to help you refine your requirements. When you feel you have provided all the relevant information to Jar3d, you can end the questioning part of the workflow by typing `/end`.
|
226 |
|
|
|
|
|
|
|
|
|
|
|
|
tools/legacy/offline_graph_rag_tool copy.py
CHANGED
@@ -1,3 +1,5 @@
|
|
|
|
|
|
1 |
import sys
|
2 |
import os
|
3 |
root_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
@@ -224,6 +226,7 @@ def run_hybrid_graph_retrrieval(graph: Neo4jGraph = None, corpus: List[Document]
|
|
224 |
print(colored("Running Hybrid Retrieval...", "yellow"))
|
225 |
unstructured_data = index_and_rank(corpus, query)
|
226 |
|
|
|
227 |
query = f"""
|
228 |
MATCH p = (n)-[r]->(m)
|
229 |
WHERE COUNT {{(n)--()}} > 30
|
@@ -241,6 +244,7 @@ def run_hybrid_graph_retrrieval(graph: Neo4jGraph = None, corpus: List[Document]
|
|
241 |
return retrieved_context
|
242 |
|
243 |
|
|
|
244 |
@timeout(20) # Change: Takes url and query as input
|
245 |
def intelligent_chunking(url: str, query: str) -> List[Document]:
|
246 |
try:
|
@@ -251,7 +255,9 @@ def intelligent_chunking(url: str, query: str) -> List[Document]:
|
|
251 |
raise ValueError("LLM_SHERPA_SERVER environment variable is not set")
|
252 |
|
253 |
corpus = []
|
254 |
-
|
|
|
|
|
255 |
try:
|
256 |
print(colored("Starting LLM Sherpa LayoutPDFReader...\n\n", "yellow"))
|
257 |
reader = LayoutPDFReader(llmsherpa_api_url)
|
@@ -261,7 +267,7 @@ def intelligent_chunking(url: str, query: str) -> List[Document]:
|
|
261 |
print(colored(f"Error in LLM Sherpa LayoutPDFReader: {str(e)}", "red"))
|
262 |
traceback.print_exc()
|
263 |
doc = None
|
264 |
-
|
265 |
if doc:
|
266 |
for chunk in doc.chunks():
|
267 |
document = Document(
|
@@ -321,6 +327,8 @@ def create_graph_index(
|
|
321 |
) -> Neo4jGraph:
|
322 |
|
323 |
if os.environ.get('LLM_SERVER') == "openai":
|
|
|
|
|
324 |
llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")
|
325 |
|
326 |
else:
|
@@ -370,6 +378,7 @@ def create_graph_index(
|
|
370 |
|
371 |
def run_rag(urls: List[str], allowed_nodes: list[str] = None, allowed_relationships: list[str] = None, query: List[str] = None, hybrid: bool = False) -> List[Dict[str, str]]:
|
372 |
# Change: adapted to take query and url as input.
|
|
|
373 |
with concurrent.futures.ThreadPoolExecutor(max_workers=min(len(urls), 5)) as executor:
|
374 |
futures = [executor.submit(intelligent_chunking, url, query) for url, query in zip(urls, query)]
|
375 |
chunks_list = [future.result() for future in concurrent.futures.as_completed(futures)]
|
@@ -382,6 +391,7 @@ def run_rag(urls: List[str], allowed_nodes: list[str] = None, allowed_relationsh
|
|
382 |
|
383 |
print(colored(f"\n\n DEBUG HYBRID VALUE: {hybrid}\n\n", "yellow"))
|
384 |
|
|
|
385 |
if hybrid:
|
386 |
print(colored(f"\n\n Creating Graph Index...\n\n", "green"))
|
387 |
graph = Neo4jGraph()
|
|
|
1 |
+
# Hybird RAG, combining "similarity search" & "knowledge graph"
|
2 |
+
|
3 |
import sys
|
4 |
import os
|
5 |
root_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
|
|
226 |
print(colored("Running Hybrid Retrieval...", "yellow"))
|
227 |
unstructured_data = index_and_rank(corpus, query)
|
228 |
|
229 |
+
# We only feed > 30 to jar3d, subset
|
230 |
query = f"""
|
231 |
MATCH p = (n)-[r]->(m)
|
232 |
WHERE COUNT {{(n)--()}} > 30
|
|
|
244 |
return retrieved_context
|
245 |
|
246 |
|
247 |
+
# The chunking process begins with the intelligent_chunking function, which takes a URL and a query as input parameters.
|
248 |
@timeout(20) # Change: Takes url and query as input
|
249 |
def intelligent_chunking(url: str, query: str) -> List[Document]:
|
250 |
try:
|
|
|
255 |
raise ValueError("LLM_SHERPA_SERVER environment variable is not set")
|
256 |
|
257 |
corpus = []
|
258 |
+
#The function utilizes LayoutPDFReader to read and extract text from the specified PDF document located at the given URL.
|
259 |
+
#This is done by calling the LLM Sherpa API, which handles the PDF reading and layout analysis.
|
260 |
+
#
|
261 |
try:
|
262 |
print(colored("Starting LLM Sherpa LayoutPDFReader...\n\n", "yellow"))
|
263 |
reader = LayoutPDFReader(llmsherpa_api_url)
|
|
|
267 |
print(colored(f"Error in LLM Sherpa LayoutPDFReader: {str(e)}", "red"))
|
268 |
traceback.print_exc()
|
269 |
doc = None
|
270 |
+
# Once the document is retrieved, it is processed into smaller, manageable chunks. Each chunk represents a segment of the document that retains semantic meaning and context.
|
271 |
if doc:
|
272 |
for chunk in doc.chunks():
|
273 |
document = Document(
|
|
|
327 |
) -> Neo4jGraph:
|
328 |
|
329 |
if os.environ.get('LLM_SERVER') == "openai":
|
330 |
+
# require hundreds calls to api
|
331 |
+
# we create index for every small chunk
|
332 |
llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")
|
333 |
|
334 |
else:
|
|
|
378 |
|
379 |
def run_rag(urls: List[str], allowed_nodes: list[str] = None, allowed_relationships: list[str] = None, query: List[str] = None, hybrid: bool = False) -> List[Dict[str, str]]:
|
380 |
# Change: adapted to take query and url as input.
|
381 |
+
# Intellegent document chunking
|
382 |
with concurrent.futures.ThreadPoolExecutor(max_workers=min(len(urls), 5)) as executor:
|
383 |
futures = [executor.submit(intelligent_chunking, url, query) for url, query in zip(urls, query)]
|
384 |
chunks_list = [future.result() for future in concurrent.futures.as_completed(futures)]
|
|
|
391 |
|
392 |
print(colored(f"\n\n DEBUG HYBRID VALUE: {hybrid}\n\n", "yellow"))
|
393 |
|
394 |
+
# combined with graph
|
395 |
if hybrid:
|
396 |
print(colored(f"\n\n Creating Graph Index...\n\n", "green"))
|
397 |
graph = Neo4jGraph()
|
tools/offline_graph_rag_tool.py
CHANGED
@@ -67,7 +67,7 @@ def deduplicate_results(results, rerank=True):
|
|
67 |
unique_results.append(result)
|
68 |
return unique_results
|
69 |
|
70 |
-
|
71 |
def index_and_rank(corpus: List[Document], query: str, top_percent: float = 20, batch_size: int = 25) -> List[Dict[str, str]]:
|
72 |
print(colored(f"\n\nStarting indexing and ranking with FastEmbeddings and FAISS for {len(corpus)} documents\n\n", "green"))
|
73 |
CACHE_DIR = "/app/fastembed_cache"
|
@@ -78,12 +78,13 @@ def index_and_rank(corpus: List[Document], query: str, top_percent: float = 20,
|
|
78 |
try:
|
79 |
# Initialize an empty FAISS index
|
80 |
index = None
|
81 |
-
docstore = InMemoryDocstore({})
|
82 |
index_to_docstore_id = {}
|
83 |
|
84 |
# Process documents in batches
|
85 |
for i in range(0, len(corpus), batch_size):
|
86 |
batch = corpus[i:i+batch_size]
|
|
|
87 |
texts = [doc.page_content for doc in batch]
|
88 |
metadatas = [doc.metadata for doc in batch]
|
89 |
|
@@ -215,13 +216,18 @@ def index_and_rank(corpus: List[Document], query: str, top_percent: float = 20,
|
|
215 |
|
216 |
return final_results
|
217 |
|
|
|
218 |
def run_hybrid_graph_retrieval(graph: Neo4jGraph = None, corpus: List[Document] = None, query: str = None, hybrid: bool = False):
|
219 |
print(colored(f"\n\Initiating Retrieval...\n\n", "green"))
|
220 |
|
221 |
if hybrid:
|
222 |
print(colored("Running Hybrid Retrieval...", "yellow"))
|
223 |
-
|
|
|
224 |
|
|
|
|
|
|
|
225 |
query = f"""
|
226 |
MATCH p = (n)-[r]->(m)
|
227 |
WHERE COUNT {{(n)--()}} > 30
|
@@ -229,6 +235,7 @@ def run_hybrid_graph_retrieval(graph: Neo4jGraph = None, corpus: List[Document]
|
|
229 |
LIMIT 85
|
230 |
"""
|
231 |
response = graph.query(query)
|
|
|
232 |
retrieved_context = f"Important Relationships:{response}\n\n Additional Context:{unstructured_data}"
|
233 |
|
234 |
else:
|
|
|
67 |
unique_results.append(result)
|
68 |
return unique_results
|
69 |
|
70 |
+
# Similarity search
|
71 |
def index_and_rank(corpus: List[Document], query: str, top_percent: float = 20, batch_size: int = 25) -> List[Dict[str, str]]:
|
72 |
print(colored(f"\n\nStarting indexing and ranking with FastEmbeddings and FAISS for {len(corpus)} documents\n\n", "green"))
|
73 |
CACHE_DIR = "/app/fastembed_cache"
|
|
|
78 |
try:
|
79 |
# Initialize an empty FAISS index
|
80 |
index = None
|
81 |
+
docstore = InMemoryDocstore({}) # store meta data
|
82 |
index_to_docstore_id = {}
|
83 |
|
84 |
# Process documents in batches
|
85 |
for i in range(0, len(corpus), batch_size):
|
86 |
batch = corpus[i:i+batch_size]
|
87 |
+
# abstract content and metadata
|
88 |
texts = [doc.page_content for doc in batch]
|
89 |
metadatas = [doc.metadata for doc in batch]
|
90 |
|
|
|
216 |
|
217 |
return final_results
|
218 |
|
219 |
+
# TODO optimize the retrieval
|
220 |
def run_hybrid_graph_retrieval(graph: Neo4jGraph = None, corpus: List[Document] = None, query: str = None, hybrid: bool = False):
|
221 |
print(colored(f"\n\Initiating Retrieval...\n\n", "green"))
|
222 |
|
223 |
if hybrid:
|
224 |
print(colored("Running Hybrid Retrieval...", "yellow"))
|
225 |
+
# Similarity search
|
226 |
+
unstructured_data = index_and_rank(corpus, query) # similarity ranking
|
227 |
|
228 |
+
# Cypher query language
|
229 |
+
# eg: where there is a directed relationship from node n to node m through relationship r.
|
230 |
+
# This condition filters the results to include only those nodes n that have more than 30 connections (relationships) to other nodes. The {(n)--()} syntax counts all relationships connected to node n, ensuring that only well-connected nodes are considered.
|
231 |
query = f"""
|
232 |
MATCH p = (n)-[r]->(m)
|
233 |
WHERE COUNT {{(n)--()}} > 30
|
|
|
235 |
LIMIT 85
|
236 |
"""
|
237 |
response = graph.query(query)
|
238 |
+
# this context will then pass to LLM to generate response
|
239 |
retrieved_context = f"Important Relationships:{response}\n\n Additional Context:{unstructured_data}"
|
240 |
|
241 |
else:
|