phi-knowledge-graph

Running on Zero

App Files Files Community

vietexob commited on Aug 30

Commit

5bfc72c

1 Parent(s): ee7f635

Adding LightRAG KG

Browse files

Files changed (6) hide show

CLAUDE.md +79 -0
app.py +30 -15
app_old.py +0 -280
llm_graph.py +125 -16
main.py +0 -392
requirements.txt +4 -2

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,79 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Application Overview
+This is a Text2Graph application that extracts knowledge graphs from natural language text. It's a Gradio web app that uses either OpenAI GPT-4.1-mini via Azure or Phi-3-mini-128k-instruct-graph via Hugging Face to extract entities and relationships from text, then visualizes them as interactive graphs.
+## Architecture
+- **app.py**: Main Gradio application with UI components, visualization logic, and caching
+- **llm_graph.py**: Core LLMGraph class that handles model selection and knowledge graph extraction
+- **cache/**: Directory for caching visualization data (first example is pre-cached for performance)
+## Key Components
+### LLMGraph Class (llm_graph.py)
+- Supports two model backends: Azure OpenAI (GPT-4.1-mini) and Hugging Face (Phi-3-mini-128k-instruct-graph)
+- Uses LightRAG for Azure OpenAI integration
+- Direct inference API calls for Hugging Face models
+- Extracts structured JSON with nodes (entities) and edges (relationships)
+### Visualization Pipeline (app.py)
+- Entity recognition visualization using spaCy's displacy
+- Interactive knowledge graph using pyvis and NetworkX
+- Caching system for performance optimization
+- Color-coded entity types with random light colors
+## Environment Setup
+Required environment variables:
+```
+HF_TOKEN=<huggingface_token>
+HF_API_ENDPOINT=<huggingface_inference_endpoint>
+AZURE_OPENAI_API_KEY=<azure_openai_key>
+AZURE_OPENAI_ENDPOINT=<azure_endpoint>
+AZURE_OPENAI_API_VERSION=<api_version>
+AZURE_OPENAI_DEPLOYMENT=<deployment_name>
+AZURE_EMBEDDING_DEPLOYMENT=<embedding_deployment>
+AZURE_EMBEDDING_API_VERSION=<embedding_api_version>
+```
+## Running the Application
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run the Gradio app
+python app.py
+```
+## Key Dependencies
+- **gradio**: Web interface framework
+- **lightrag-hku**: RAG framework for Azure OpenAI integration
+- **transformers**: Hugging Face model integration
+- **pyvis**: Interactive network visualization
+- **networkx**: Graph data structure and algorithms
+- **spacy**: Natural language processing and entity visualization
+- **openai**: Azure OpenAI client
+## Data Flow
+1. User inputs text and selects model
+2. LLMGraph.extract() processes text using selected model backend
+3. JSON response contains nodes (entities) and edges (relationships)
+4. Visualization functions create entity highlighting and interactive graph
+5. Results cached for performance (first example only)
+## Model Behavior
+The application expects JSON output with this schema:
+```json
+{
+  "nodes": [{"id": "entity", "type": "broad_type", "detailed_type": "specific_type"}],
+  "edges": [{"from": "entity1", "to": "entity2", "label": "relationship"}]
+}
+```

app.py CHANGED Viewed

@@ -3,7 +3,10 @@ import os
 import spacy
 import pickle
 import random
 import rapidjson
 import gradio as gr
 import networkx as nx
@@ -12,6 +15,8 @@ from pyvis.network import Network
 from spacy import displacy
 from spacy.tokens import Span
 # Constants
 TITLE = "🌐 Text2Graph: Extract Knowledge Graphs from Natural Language"
 SUBTITLE = "✨ Extract and visualize knowledge graphs from texts in any language!"
@@ -53,7 +58,7 @@ def handle_text(text=""):
     return " ".join(text.split())
 # @spaces.GPU
-def extract_kg(text="", model=None):
     """
     Extract knowledge graph from text
     """
@@ -62,8 +67,9 @@ def extract_kg(text="", model=None):
     if not text or not model:
         raise gr.Error("⚠️ Both text and model must be provided!")
     try:
-        model = LLMGraph(model=model)
-        result = model.extract(text)
         return rapidjson.loads(result)
     except Exception as e:
         raise gr.Error(f"❌ Extraction error: {str(e)}")
@@ -217,7 +223,7 @@ def create_graph(json_data):
         allow-top-navigation-by-user-activation allow-downloads" allowfullscreen=""
         allowpaymentrequest="" frameborder="0" srcdoc='{html}'></iframe>"""
-def process_and_visualize(text, model, progress=gr.Progress()):
     """
     Process text and visualize knowledge graph and entities
     """
@@ -238,11 +244,12 @@ def process_and_visualize(text, model, progress=gr.Progress()):
             progress(1.0, desc="Loaded from cache!")
             return cache_data["graph_html"], cache_data["entities_viz"], cache_data["json_data"], cache_data["stats"]
         except Exception as e:
-            print(f"Cache loading error: {str(e)}")
     # Continue with normal processing if cache fails
     progress(0, desc="Starting extraction...")
-    json_data = extract_kg(text, model)
     progress(0.5, desc="Creating entity visualization...")
     entities_viz = create_custom_entity_viz(json_data, text)
@@ -266,7 +273,8 @@ def process_and_visualize(text, model, progress=gr.Progress()):
             with open(EXAMPLE_CACHE_FILE, 'wb') as f:
                 pickle.dump(cache_data, f)
         except Exception as e:
-            print(f"Cache saving error: {str(e)}")
     progress(1.0, desc="Complete!")
     return graph_html, entities_viz, json_data, stats
@@ -293,20 +301,21 @@ EXAMPLES = [
                  les buis et à arroser les rosiers, perpétuant ainsi une tradition d'excellence horticole qui fait la fierté de la capitale française.""")],
 ]
-def generate_first_example_cache():
     """
     Generate cache for the first example if it doesn't exist when the app starts
     """
     if not os.path.exists(EXAMPLE_CACHE_FILE):
-        print("Generating cache for first example...")
         try:
             text = EXAMPLES[0][0]
             model = MODEL_LIST[0] if MODEL_LIST else None
             # Extract data
-            json_data = extract_kg(text, model)
             entities_viz = create_custom_entity_viz(json_data, text)
             graph_html = create_graph(json_data)
@@ -324,18 +333,24 @@ def generate_first_example_cache():
             with open(EXAMPLE_CACHE_FILE, 'wb') as f:
                 pickle.dump(cached_data, f)
-            print("First example cache generated successfully")
             return cached_data
         except Exception as e:
-            print(f"Error generating first example cache: {str(e)}")
     else:
-        print("First example cache already exists")
         try:
             with open(EXAMPLE_CACHE_FILE, 'rb') as f:
                 return pickle.load(f)
         except Exception as e:
-            print(f"Error loading existing cache: {str(e)}")
     return None
@@ -345,7 +360,7 @@ def create_ui():
     """
     # Try to generate/load the first example cache
-    first_example_cache = generate_first_example_cache()
     with gr.Blocks(css=CUSTOM_CSS, title=TITLE) as demo:
         # Header

 import spacy
 import pickle
 import random
+import logging
 import rapidjson
+import asyncio
 import gradio as gr
 import networkx as nx
 from spacy import displacy
 from spacy.tokens import Span
+logging.basicConfig(level=logging.INFO)
 # Constants
 TITLE = "🌐 Text2Graph: Extract Knowledge Graphs from Natural Language"
 SUBTITLE = "✨ Extract and visualize knowledge graphs from texts in any language!"
     return " ".join(text.split())
 # @spaces.GPU
+async def extract_kg(text="", model=None):
     """
     Extract knowledge graph from text
     """
     if not text or not model:
         raise gr.Error("⚠️ Both text and model must be provided!")
     try:
+        model_instance = LLMGraph(model=model)
+        result = await model_instance.extract(text)
         return rapidjson.loads(result)
     except Exception as e:
         raise gr.Error(f"❌ Extraction error: {str(e)}")
         allow-top-navigation-by-user-activation allow-downloads" allowfullscreen=""
         allowpaymentrequest="" frameborder="0" srcdoc='{html}'></iframe>"""
+async def process_and_visualize(text, model, progress=gr.Progress()):
     """
     Process text and visualize knowledge graph and entities
     """
             progress(1.0, desc="Loaded from cache!")
             return cache_data["graph_html"], cache_data["entities_viz"], cache_data["json_data"], cache_data["stats"]
         except Exception as e:
+            # print(f"Cache loading error: {str(e)}")
+            logging.error(f"Cache loading error: {str(e)}")
     # Continue with normal processing if cache fails
     progress(0, desc="Starting extraction...")
+    json_data = await extract_kg(text, model)
     progress(0.5, desc="Creating entity visualization...")
     entities_viz = create_custom_entity_viz(json_data, text)
             with open(EXAMPLE_CACHE_FILE, 'wb') as f:
                 pickle.dump(cache_data, f)
         except Exception as e:
+            # print(f"Cache saving error: {str(e)}")
+            logging.error(f"Cache saving error: {str(e)}")
     progress(1.0, desc="Complete!")
     return graph_html, entities_viz, json_data, stats
                  les buis et à arroser les rosiers, perpétuant ainsi une tradition d'excellence horticole qui fait la fierté de la capitale française.""")],
 ]
+async def generate_first_example_cache():
     """
     Generate cache for the first example if it doesn't exist when the app starts
     """
     if not os.path.exists(EXAMPLE_CACHE_FILE):
+        # print("Generating cache for first example...")
+        logging.info("Generating cache for first example...")
         try:
             text = EXAMPLES[0][0]
             model = MODEL_LIST[0] if MODEL_LIST else None
             # Extract data
+            json_data = await extract_kg(text, model)
             entities_viz = create_custom_entity_viz(json_data, text)
             graph_html = create_graph(json_data)
             with open(EXAMPLE_CACHE_FILE, 'wb') as f:
                 pickle.dump(cached_data, f)
+            # print("First example cache generated successfully")
+            logging.info("First example cache generated successfully")
             return cached_data
         except Exception as e:
+            # print(f"Error generating first example cache: {str(e)}")
+            logging.error(f"Error generating first example cache: {str(e)}")
     else:
+        # print("First example cache already exists")
+        logging.info("First example cache already exists")
+        # Load existing cache
         try:
             with open(EXAMPLE_CACHE_FILE, 'rb') as f:
                 return pickle.load(f)
         except Exception as e:
+            # print(f"Error loading existing cache: {str(e)}")
+            logging.error(f"Error loading existing cache: {str(e)}")
     return None
     """
     # Try to generate/load the first example cache
+    first_example_cache = asyncio.run(generate_first_example_cache())
     with gr.Blocks(css=CUSTOM_CSS, title=TITLE) as demo:
         # Header

app_old.py DELETED Viewed

@@ -1,280 +0,0 @@
-# import spaces
-import gradio as gr
-from llm_graph import MODEL_LIST, LLMGraph
-import rapidjson
-from pyvis.network import Network
-import networkx as nx
-import spacy
-from spacy import displacy
-from spacy.tokens import Span
-import random
-from tqdm import tqdm
-# Constants
-TITLE = "🌐 GraphMind: Phi-3 Instruct Graph Explorer"
-SUBTITLE = "✨ Extract and visualize knowledge graphs from any text in multiple languages"
-# Custom CSS for styling
-CUSTOM_CSS = """
-.gradio-container {
-    font-family: 'Inter', 'Segoe UI', Roboto, sans-serif;
-}
-.gr-button-primary {
-    background-color: #6366f1 !important;
-}
-.gr-button-secondary {
-    border-color: #6366f1 !important;
-    color: #6366f1 !important;
-}
-"""
-# Color utilities
-def get_random_light_color():
-    r = random.randint(140, 255)
-    g = random.randint(140, 255)
-    b = random.randint(140, 255)
-    return f"#{r:02x}{g:02x}{b:02x}"
-# Text preprocessing
-def handle_text(text):
-    return " ".join(text.split())
-# Main processing functions
-# @spaces.GPU
-def extract(text, model):
-    try:
-        model = LLMGraph(model=model)
-        result = model.extract(text)
-        return rapidjson.loads(result)
-    except Exception as e:
-        raise gr.Error(f"Extraction error: {str(e)}")
-def find_token_indices(doc, substring, text):
-    result = []
-    start_index = text.find(substring)
-    while start_index != -1:
-        end_index = start_index + len(substring)
-        start_token = None
-        end_token = None
-        for token in doc:
-            if token.idx == start_index:
-                start_token = token.i
-            if token.idx + len(token) == end_index:
-                end_token = token.i + 1
-        if start_token is not None and end_token is not None:
-            result.append({
-                "start": start_token,
-                "end": end_token
-            })
-        # Search for next occurrence
-        start_index = text.find(substring, end_index)
-    return result
-def create_custom_entity_viz(data, full_text):
-    nlp = spacy.blank("xx")
-    doc = nlp(full_text)
-    spans = []
-    colors = {}
-    for node in data["nodes"]:
-        entity_spans = find_token_indices(doc, node["id"], full_text)
-        for dataentity in entity_spans:
-            start = dataentity["start"]
-            end = dataentity["end"]
-            if start < len(doc) and end <= len(doc):
-                # Check for overlapping spans
-                overlapping = any(s.start < end and start < s.end for s in spans)
-                if not overlapping:
-                    span = Span(doc, start, end, label=node["type"])
-                    spans.append(span)
-                    if node["type"] not in colors:
-                        colors[node["type"]] = get_random_light_color()
-    doc.set_ents(spans, default="unmodified")
-    doc.spans["sc"] = spans
-    options = {
-        "colors": colors,
-        "ents": list(colors.keys()),
-        "style": "ent",
-        "manual": True
-    }
-    html = displacy.render(doc, style="span", options=options)
-    return html
-def create_graph(json_data):
-    G = nx.Graph()
-    # Add nodes with tooltips
-    for node in json_data['nodes']:
-        G.add_node(node['id'], title=f"{node['type']}: {node['detailed_type']}")
-    # Add edges with labels
-    for edge in json_data['edges']:
-        G.add_edge(edge['from'], edge['to'], title=edge['label'], label=edge['label'])
-    # Create network visualization
-    nt = Network(
-        width="720px",
-        height="600px",
-        directed=True,
-        notebook=False,
-        bgcolor="#f8fafc",
-        font_color="#1e293b"
-    )
-    # Configure network display
-    nt.from_nx(G)
-    nt.barnes_hut(
-        gravity=-3000,
-        central_gravity=0.3,
-        spring_length=50,
-        spring_strength=0.001,
-        damping=0.09,
-        overlap=0,
-    )
-    # Customize edge appearance
-    for edge in nt.edges:
-        edge['width'] = 2
-        edge['arrows'] = {'to': {'enabled': True, 'type': 'arrow'}}
-        edge['color'] = {'color': '#6366f1', 'highlight': '#4f46e5'}
-        edge['font'] = {'size': 12, 'color': '#4b5563', 'face': 'Arial'}
-    # Customize node appearance
-    for node in nt.nodes:
-        node['color'] = {'background': '#e0e7ff', 'border': '#6366f1', 'highlight': {'background': '#c7d2fe', 'border': '#4f46e5'}}
-        node['font'] = {'size': 14, 'color': '#1e293b'}
-        node['shape'] = 'dot'
-        node['size'] = 25
-    # Generate HTML with iframe to isolate styles
-    html = nt.generate_html()
-    html = html.replace("'", '"')
-    return f"""<iframe style="width: 100%; height: 620px; margin: 0 auto; border-radius: 8px; box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06);"
-        name="result" allow="midi; geolocation; microphone; camera; display-capture; encrypted-media;"
-        sandbox="allow-modals allow-forms allow-scripts allow-same-origin allow-popups
-        allow-top-navigation-by-user-activation allow-downloads" allowfullscreen=""
-        allowpaymentrequest="" frameborder="0" srcdoc='{html}'></iframe>"""
-def process_and_visualize(text, model, progress=gr.Progress()):
-    if not text or not model:
-        raise gr.Error("⚠️ Both text and model must be provided.")
-    progress(0, desc="Starting extraction...")
-    json_data = extract(text, model)
-    progress(0.5, desc="Creating entity visualization...")
-    entities_viz = create_custom_entity_viz(json_data, text)
-    progress(0.8, desc="Building knowledge graph...")
-    graph_html = create_graph(json_data)
-    node_count = len(json_data["nodes"])
-    edge_count = len(json_data["edges"])
-    stats = f"📊 Extracted {node_count} entities and {edge_count} relationships"
-    progress(1.0, desc="Complete!")
-    return graph_html, entities_viz, json_data, stats
-# Example texts in different languages
-EXAMPLES = [
-    [handle_text("""Legendary rock band Aerosmith has officially announced their retirement from touring after 54 years, citing
-    lead singer Steven Tyler's unrecoverable vocal cord injury.
-    The decision comes after months of unsuccessful treatment for Tyler's fractured larynx,
-    which he suffered in September 2023.""")],
-    [handle_text("""Pop star Justin Timberlake, 43, had his driver's license suspended by a New York judge during a virtual
-    court hearing on August 2, 2024. The suspension follows Timberlake's arrest for driving while intoxicated (DWI)
-    in Sag Harbor on June 18. Timberlake, who is currently on tour in Europe,
-    pleaded not guilty to the charges.""")],
-    [handle_text("""세계적인 기술 기업 삼성전자는 새로운 인공지능 기반 스마트폰을 올해 하반기에 출시할 예정이라고 발표했다.
-    이 스마트폰은 현재 개발 중인 갤럭시 시리즈의 최신작으로, 강력한 AI 기능과 혁신적인 카메라 시스템을 탑재할 것으로 알려졌다.
-    삼성전자의 CEO는 이번 신제품이 스마트폰 시장에 새로운 혁신을 가져올 것이라고 전망했다.""")],
-    [handle_text("""한국 영화 '기생충'은 2020년 아카데미 시상식에서 작품상, 감독상, 각본상, 국제영화상 등 4개 부문을 수상하며 역사를 새로 썼다.
-    봉준호 감독이 연출한 이 영화는 한국 영화 최초로 칸 영화제 황금종려상도 수상했으며, 전 세계적으로 엄청난 흥행과
-    평단의 호평을 받았다.""")]
-]
-def create_ui():
-    with gr.Blocks(css=CUSTOM_CSS, title=TITLE) as demo:
-        # Header
-        gr.Markdown(f"# {TITLE}")
-        gr.Markdown(f"{SUBTITLE}")
-        with gr.Row():
-            gr.Markdown("🌍 **Multilingual Support Available** 🔤")
-        # Main interface
-        with gr.Row():
-            # Input column
-            with gr.Column(scale=1):
-                input_model = gr.Dropdown(
-                    MODEL_LIST,
-                    label="🤖 Select Model",
-                    info="Choose a model to process your text",
-                    value=MODEL_LIST[0] if MODEL_LIST else None
-                )
-                input_text = gr.TextArea(
-                    label="📝 Input Text",
-                    info="Enter text in any language to extract a knowledge graph",
-                    placeholder="Enter text here...",
-                    lines=10
-                )
-                with gr.Row():
-                    submit_button = gr.Button("🚀 Extract & Visualize", variant="primary", scale=2)
-                    clear_button = gr.Button("🔄 Clear", variant="secondary", scale=1)
-                gr.Examples(
-                    examples=EXAMPLES,
-                    inputs=input_text,
-                    label="📚 Example Texts (English & Korean)"
-                )
-                stats_output = gr.Markdown("", label="🔍 Analysis Results")
-            # Output column
-            with gr.Column(scale=1):
-                with gr.Tab("🧩 Knowledge Graph"):
-                    output_graph = gr.HTML(label="")
-                with gr.Tab("🏷️ Entities"):
-                    output_entity_viz = gr.HTML(label="")
-                with gr.Tab("📊 JSON Data"):
-                    output_json = gr.JSON(label="")
-        # Functionality
-        submit_button.click(
-            fn=process_and_visualize,
-            inputs=[input_text, input_model],
-            outputs=[output_graph, output_entity_viz, output_json, stats_output]
-        )
-        clear_button.click(
-            fn=lambda: [None, None, None, ""],
-            inputs=[],
-            outputs=[output_graph, output_entity_viz, output_json, stats_output]
-        )
-        # Footer
-        gr.Markdown("---")
-        gr.Markdown("📋 **Instructions:** Enter text in any language, select a model, and click 'Extract & Visualize' to generate a knowledge graph.")
-        gr.Markdown("🛠️ Powered by Phi-3 Instruct Graph | Emergent Methods")
-    return demo
-demo = create_ui()
-demo.launch(share=False)

llm_graph.py CHANGED Viewed

@@ -1,18 +1,31 @@
 import os
-from textwrap import dedent
-from huggingface_hub import InferenceClient
 from dotenv import load_dotenv
 load_dotenv()
 api_token = os.environ["HF_TOKEN"]
 endpoint_url = os.environ["HF_API_ENDPOINT"]
-# Initialize the client with your endpoint URL and token.
-client = InferenceClient(
-    model=endpoint_url,
-    token=api_token
-)
 MODEL_LIST = [
   "OpenAI/GPT-4.1-mini",
@@ -20,15 +33,71 @@ MODEL_LIST = [
 ]
 class LLMGraph:
     def __init__(self, model="OpenAI/GPT-4.1-mini"):
         """
         Initialize the Phi3InstructGraph with a specified model.
         """
         if model not in MODEL_LIST:
             raise ValueError(f"Model must be one of {MODEL_LIST}")
-        self.model_path = model
     def _generate(self, messages):
         """
@@ -36,7 +105,7 @@ class LLMGraph:
         """
         # Use the chat_completion method
-        response = client.chat_completion(
             messages=messages,
             max_tokens=1024,
         )
@@ -85,7 +154,6 @@ class LLMGraph:
                     -------Text end-------
                     """)
-        # if self.model_path == "EmergentMethods/Phi-3-medium-128k-instruct-graph":
         messages = [
             {
                 "role": "system",
@@ -96,17 +164,58 @@ class LLMGraph:
                 "content": user_message
             }
         ]
-        # else:
-        #     # TODO: update for other models
         return messages
-    def extract(self, text):
         """
         Extract knowledge graph from text
         """
-        messages = self._get_messages(text)
-        generated_text = self._generate(messages)
         return generated_text

 import os
+import asyncio
+import numpy as np
+from textwrap import dedent
 from dotenv import load_dotenv
+from openai import AzureOpenAI
+from huggingface_hub import InferenceClient
+from lightrag import LightRAG
+from lightrag.utils import EmbeddingFunc
+from lightrag.kg.shared_storage import initialize_pipeline_status
 load_dotenv()
+# Load the environment variables
 api_token = os.environ["HF_TOKEN"]
 endpoint_url = os.environ["HF_API_ENDPOINT"]
+AZURE_OPENAI_API_VERSION = os.environ["AZURE_OPENAI_API_VERSION"]
+AZURE_OPENAI_DEPLOYMENT = os.environ["AZURE_OPENAI_DEPLOYMENT"]
+AZURE_OPENAI_API_KEY = os.environ["AZURE_OPENAI_API_KEY"]
+AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]
+AZURE_EMBEDDING_DEPLOYMENT = os.environ["AZURE_EMBEDDING_DEPLOYMENT"]
+AZURE_EMBEDDING_API_VERSION = os.environ["AZURE_EMBEDDING_API_VERSION"]
+WORKING_DIR = "./cache"
 MODEL_LIST = [
   "OpenAI/GPT-4.1-mini",
 ]
 class LLMGraph:
+    """
+    A class to interact with LLMs for knowledge graph extraction.
+    """
+    async def _initialize_rag(self, embedding_dimension=3072):
+        """
+        Initialize the LightRAG instance with the specified embedding dimension.
+        """
+        rag = LightRAG(
+            working_dir=WORKING_DIR,
+            llm_model_func=self._llm_model_func,
+            embedding_func=EmbeddingFunc(
+                embedding_dim=embedding_dimension,
+                max_token_size=8192,
+                func=self._embedding_func,
+            ),
+        )
+        await rag.initialize_storages()
+        await initialize_pipeline_status()
+        return rag
+    async def _get_rag(self):
+        """
+        Get or initialize the RAG instance (lazy loading).
+        """
+        if self.rag is None:
+            self.rag = await self._initialize_rag()
+        return self.rag
     def __init__(self, model="OpenAI/GPT-4.1-mini"):
         """
         Initialize the Phi3InstructGraph with a specified model.
         """
         if model not in MODEL_LIST:
             raise ValueError(f"Model must be one of {MODEL_LIST}")
+        self.model_name = model
+        if model == MODEL_LIST[0]:
+            # Use Azure OpenAI for GPT-4.1-mini
+            self.llm_client = AzureOpenAI(
+                api_key=AZURE_OPENAI_API_KEY,
+                api_version=AZURE_OPENAI_API_VERSION,
+                azure_endpoint=AZURE_OPENAI_ENDPOINT,
+            )
+            self.emb_client = AzureOpenAI(
+                api_key=AZURE_OPENAI_API_KEY,
+                api_version=AZURE_EMBEDDING_API_VERSION,
+                azure_endpoint=AZURE_OPENAI_ENDPOINT,
+            )
+            self.rag = None  # Initialize as None for lazy loading
+        else:
+            # Use Hugging Face Inference API for Phi-3-mini-128k-instruct-graph
+            self.hf_client = InferenceClient(
+                model=endpoint_url,
+                token=api_token
+            )
     def _generate(self, messages):
         """
         """
         # Use the chat_completion method
+        response = self.hf_client.chat_completion(
             messages=messages,
             max_tokens=1024,
         )
                     -------Text end-------
                     """)
         messages = [
             {
                 "role": "system",
                 "content": user_message
             }
         ]
         return messages
+    async def extract(self, text):
         """
         Extract knowledge graph from text
         """
+        generated_text = ""
+        if self.model_name == MODEL_LIST[0]:
+            # Use LightRAG with Azure OpenAI
+            rag = await self._get_rag()
+            rag.insert(text)
+        else:
+            # Use Hugging Face Inference API with Phi-3-mini-128k-instruct-graph
+            messages = self._get_messages(text)
+            generated_text = self._generate(messages)
         return generated_text
+    async def _llm_model_func(self, prompt, system_prompt=None, history_messages=[], **kwargs) -> str:
+        """
+        Call the Azure OpenAI chat completion endpoint with the given prompt and optional system prompt and history messages.
+        """
+        messages = []
+        if system_prompt:
+            messages.append({"role": "system", "content": system_prompt})
+        if history_messages:
+            messages.extend(history_messages)
+        messages.append({"role": "user", "content": prompt})
+        chat_completion = self.llm_client.chat.completions.create(
+            model=AZURE_OPENAI_DEPLOYMENT,
+            messages=messages,
+            temperature=kwargs.get("temperature", 0),
+            top_p=kwargs.get("top_p", 1),
+            n=kwargs.get("n", 1),
+        )
+        return chat_completion.choices[0].message.content
+    async def _embedding_func(self, texts: list[str]) -> np.ndarray:
+        """
+        Call the Azure OpenAI embeddings endpoint with the given texts.
+        """
+        embedding = self.emb_client.embeddings.create(model=AZURE_EMBEDDING_DEPLOYMENT, input=texts)
+        embeddings = [item.embedding for item in embedding.data]
+        return np.array(embeddings)

main.py DELETED Viewed

@@ -1,392 +0,0 @@
-# import spaces
-import gradio as gr
-from llm_graph import MODEL_LIST, LLMGraph
-import rapidjson
-from pyvis.network import Network
-import networkx as nx
-import spacy
-from spacy import displacy
-from spacy.tokens import Span
-import random
-import time
-# Set up the theme and styling
-CUSTOM_CSS = """
-.gradio-container {
-    font-family: 'Inter', 'Segoe UI', Roboto, sans-serif;
-}
-.gr-prose h1 {
-    font-size: 2.5rem !important;
-    margin-bottom: 0.5rem !important;
-    background: linear-gradient(90deg, #4338ca, #a855f7);
-    -webkit-background-clip: text;
-    -webkit-text-fill-color: transparent;
-}
-.gr-prose h2 {
-    font-size: 1.8rem !important;
-    margin-top: 1rem !important;
-}
-.info-box {
-    padding: 1rem;
-    border-radius: 0.5rem;
-    background-color: #f3f4f6;
-    margin-bottom: 1rem;
-    border-left: 4px solid #6366f1;
-}
-.language-badge {
-    display: inline-block;
-    padding: 0.25rem 0.5rem;
-    border-radius: 9999px;
-    font-size: 0.75rem;
-    font-weight: 600;
-    background-color: #e0e7ff;
-    color: #4338ca;
-    margin-right: 0.5rem;
-    margin-bottom: 0.5rem;
-}
-.footer {
-    text-align: center;
-    margin-top: 2rem;
-    padding-top: 1rem;
-    border-top: 1px solid #e2e8f0;
-    font-size: 0.875rem;
-    color: #64748b;
-}
-"""
-# Color utilities
-def get_random_light_color():
-    r = random.randint(150, 255)
-    g = random.randint(150, 255)
-    b = random.randint(150, 255)
-    return f"#{r:02x}{g:02x}{b:02x}"
-# Text processing helper
-def handle_text(text):
-    return " ".join(text.split())
-# Core extraction function
-# @spaces.GPU
-def extract(text, model):
-    model = LLMGraph(model=model)
-    try:
-        result = model.extract(text)
-        return rapidjson.loads(result)
-    except Exception as e:
-        raise gr.Error(f"🚨 Extraction failed: {str(e)}")
-def find_token_indices(doc, substring, text):
-    result = []
-    start_index = text.find(substring)
-    while start_index != -1:
-        end_index = start_index + len(substring)
-        start_token = None
-        end_token = None
-        for token in doc:
-            if token.idx == start_index:
-                start_token = token.i
-            if token.idx + len(token) == end_index:
-                end_token = token.i + 1
-        if start_token is not None and end_token is not None:
-            result.append({
-                "start": start_token,
-                "end": end_token
-            })
-        # Search for next occurrence
-        start_index = text.find(substring, end_index)
-    return result
-def create_custom_entity_viz(data, full_text):
-    nlp = spacy.blank("xx")
-    doc = nlp(full_text)
-    spans = []
-    colors = {}
-    for node in data["nodes"]:
-        entity_spans = find_token_indices(doc, node["id"], full_text)
-        for dataentity in entity_spans:
-            start = dataentity["start"]
-            end = dataentity["end"]
-            if start < len(doc) and end <= len(doc):
-                # Check for overlapping spans
-                overlapping = any(s.start < end and start < s.end for s in spans)
-                if not overlapping:
-                    span = Span(doc, start, end, label=node["type"])
-                    spans.append(span)
-                    if node["type"] not in colors:
-                        colors[node["type"]] = get_random_light_color()
-    doc.set_ents(spans, default="unmodified")
-    doc.spans["sc"] = spans
-    options = {
-        "colors": colors,
-        "ents": list(colors.keys()),
-        "style": "ent",
-        "manual": True
-    }
-    html = displacy.render(doc, style="span", options=options)
-    # Add custom styling to the entity visualization
-    styled_html = f"""
-    <div style="border-radius: 0.5rem; padding: 1rem; background-color: white;
-                border: 1px solid #e2e8f0; box-shadow: 0 1px 3px 0 rgba(0, 0, 0, 0.1);">
-        <div style="margin-bottom: 0.75rem; font-weight: 500; color: #4b5563;">
-            Entity types found:
-            {' '.join([f'<span style="display: inline-block; margin-right: 0.5rem; margin-bottom: 0.5rem; padding: 0.25rem 0.5rem; border-radius: 9999px; font-size: 0.75rem; background-color: {colors[entity_type]}; color: #1e293b;">{entity_type}</span>' for entity_type in colors.keys()])}
-        </div>
-        {html}
-    </div>
-    """
-    return styled_html
-def create_graph(json_data):
-    G = nx.DiGraph()  # Using DiGraph for directed graph
-    # Add nodes
-    for node in json_data['nodes']:
-        G.add_node(node['id'],
-                  title=f"{node['type']}: {node['detailed_type']}",
-                  group=node['type'])  # Group nodes by type
-    # Add edges
-    for edge in json_data['edges']:
-        G.add_edge(edge['from'], edge['to'], title=edge['label'], label=edge['label'])
-    # Create network visualization
-    nt = Network(
-        width="100%",
-        height="600px",
-        directed=True,
-        notebook=False,
-        bgcolor="#fafafa",
-        font_color="#1e293b"
-    )
-    # Configure network
-    nt.from_nx(G)
-    nt.barnes_hut(
-        gravity=-3000,
-        central_gravity=0.3,
-        spring_length=150,
-        spring_strength=0.001,
-        damping=0.09,
-        overlap=0,
-    )
-    # Create color groups for node types
-    node_types = {node['type'] for node in json_data['nodes']}
-    colors = {}
-    for i, node_type in enumerate(node_types):
-        hue = (i * 137) % 360  # Golden ratio to distribute colors
-        colors[node_type] = f"hsl({hue}, 70%, 70%)"
-    # Customize nodes
-    for node in nt.nodes:
-        node_data = next((n for n in json_data['nodes'] if n['id'] == node['id']), None)
-        if node_data:
-            node_type = node_data['type']
-            node['color'] = colors.get(node_type, "#bfdbfe")
-            node['shape'] = 'dot'
-            node['size'] = 20
-            node['borderWidth'] = 2
-            node['borderWidthSelected'] = 4
-            node['font'] = {'size': 14, 'color': '#1e293b', 'face': 'Inter, Arial'}
-    # Customize edges
-    for edge in nt.edges:
-        edge['color'] = {'color': '#94a3b8', 'highlight': '#6366f1', 'hover': '#818cf8'}
-        edge['width'] = 1.5
-        edge['selectionWidth'] = 2
-        edge['hoverWidth'] = 2
-        edge['arrows'] = {'to': {'enabled': True, 'type': 'arrow'}}
-        edge['smooth'] = {'type': 'continuous', 'roundness': 0.2}
-        edge['font'] = {'size': 12, 'color': '#4b5563', 'face': 'Inter, Arial', 'strokeWidth': 2, 'strokeColor': '#ffffff'}
-    # Generate HTML
-    html = nt.generate_html()
-    html = html.replace("'", '"')
-    html = html.replace('height: 600px;', 'height: 600px; border-radius: 8px;')
-    return f"""<iframe style="width: 100%; height: 620px; margin: 0 auto; border-radius: 8px; box-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06);"
-        name="result" allow="midi; geolocation; microphone; camera; display-capture; encrypted-media;"
-        sandbox="allow-modals allow-forms allow-scripts allow-same-origin allow-popups
-        allow-top-navigation-by-user-activation allow-downloads" allowfullscreen=""
-        allowpaymentrequest="" frameborder="0" srcdoc='{html}'></iframe>"""
-def process_and_visualize(text, model, progress=gr.Progress()):
-    if not text or not model:
-        raise gr.Error("⚠️ Please provide both text and model")
-    # Progress updates
-    progress(0.1, "Initializing...")
-    time.sleep(0.2)  # Small delay for UI feedback
-    # Extract graph
-    progress(0.2, "Extracting knowledge graph...")
-    json_data = extract(text, model)
-    # Entity visualization
-    progress(0.6, "Identifying entities...")
-    entities_viz = create_custom_entity_viz(json_data, text)
-    # Graph visualization
-    progress(0.8, "Building graph visualization...")
-    graph_html = create_graph(json_data)
-    # Statistics
-    entity_types = {}
-    for node in json_data['nodes']:
-        entity_type = node['type']
-        if entity_type in entity_types:
-            entity_types[entity_type] += 1
-        else:
-            entity_types[entity_type] = 1
-    stats_html = f"""
-    <div class="info-box">
-        <h3 style="margin-top: 0;">📊 Extraction Results</h3>
-        <p>✅ Successfully extracted <b>{len(json_data['nodes'])}</b> entities and <b>{len(json_data['edges'])}</b> relationships.</p>
-        <div>
-            <h4>Entity Types:</h4>
-            <div>
-                {''.join([f'<span class="language-badge">{entity_type}: {count}</span>' for entity_type, count in entity_types.items()])}
-            </div>
-        </div>
-    </div>
-    """
-    progress(1.0, "Done!")
-    return graph_html, entities_viz, json_data, stats_html
-def language_info():
-    return """
-    <div class="info-box">
-        <h3 style="margin-top: 0;">🌍 Multilingual Support</h3>
-        <p>This application supports text analysis in multiple languages, including:</p>
-        <div>
-            <span class="language-badge">English 🇬🇧</span>
-            <span class="language-badge">Korean 🇰🇷</span>
-            <span class="language-badge">Spanish 🇪🇸</span>
-            <span class="language-badge">French 🇫🇷</span>
-            <span class="language-badge">German 🇩🇪</span>
-            <span class="language-badge">Japanese 🇯🇵</span>
-            <span class="language-badge">Chinese 🇨🇳</span>
-            <span class="language-badge">And more...</span>
-        </div>
-    </div>
-    """
-def tips_html():
-    return """
-    <div class="info-box">
-        <h3 style="margin-top: 0;">💡 Tips for Best Results</h3>
-        <ul>
-            <li>Use clear, descriptive sentences with well-defined relationships</li>
-            <li>Include specific entities, events, dates, and locations for better extraction</li>
-            <li>Longer texts provide more context for relationship identification</li>
-            <li>Try different models to compare extraction results</li>
-        </ul>
-    </div>
-    """
-# Examples in multiple languages
-EXAMPLES = [
-    [handle_text("""Legendary rock band Aerosmith has officially announced their retirement from touring after 54 years, citing
-    lead singer Steven Tyler's unrecoverable vocal cord injury.
-    The decision comes after months of unsuccessful treatment for Tyler's fractured larynx,
-    which he suffered in September 2023.""")],
-    [handle_text("""Pop star Justin Timberlake, 43, had his driver's license suspended by a New York judge during a virtual
-    court hearing on August 2, 2024. The suspension follows Timberlake's arrest for driving while intoxicated (DWI)
-    in Sag Harbor on June 18. Timberlake, who is currently on tour in Europe,
-    pleaded not guilty to the charges.""")],
-]
-# Main UI
-with gr.Blocks(css=CUSTOM_CSS, title="🧠 Phi-3 Knowledge Graph Explorer") as demo:
-    # Header
-    gr.Markdown("# 🧠 Phi-3 Knowledge Graph Explorer")
-    gr.Markdown("### ✨ Extract and visualize knowledge graphs from text in any language")
-    with gr.Row():
-        with gr.Column(scale=2):
-            input_text = gr.TextArea(
-                label="📝 Enter your text",
-                placeholder="Paste or type your text here...",
-                lines=10
-            )
-            with gr.Row():
-                input_model = gr.Dropdown(
-                    MODEL_LIST,
-                    label="🤖 Model",
-                    value=MODEL_LIST[0] if MODEL_LIST else None,
-                    info="Select the model to use for extraction"
-                )
-                with gr.Column():
-                    submit_button = gr.Button("🔍 Extract & Visualize", variant="primary")
-                    clear_button = gr.Button("🔄 Clear", variant="secondary")
-            # Multilingual support info
-            gr.HTML(language_info())
-            # Examples section
-            gr.Examples(
-                examples=EXAMPLES,
-                inputs=input_text,
-                label="📚 Example Texts (English & Korean)"
-            )
-            # Tips
-            gr.HTML(tips_html())
-        with gr.Column(scale=3):
-            # Stats output
-            stats_output = gr.HTML(label="")
-            # Tabs for different visualizations
-            with gr.Tabs():
-                with gr.TabItem("🔄 Knowledge Graph"):
-                    output_graph = gr.HTML()
-                with gr.TabItem("🏷️ Entity Recognition"):
-                    output_entity_viz = gr.HTML()
-                with gr.TabItem("📊 JSON Data"):
-                    output_json = gr.JSON()
-    # Footer
-    gr.HTML("""
-    <div class="footer">
-        <p>🌐 Powered by Phi-3 Instruct Graph | Created by Emergent Methods</p>
-        <p>© 2025 | Knowledge Graph Explorer</p>
-    </div>
-    """)
-    # Set up event handlers
-    submit_button.click(
-        fn=process_and_visualize,
-        inputs=[input_text, input_model],
-        outputs=[output_graph, output_entity_viz, output_json, stats_output]
-    )
-    clear_button.click(
-        fn=lambda: [None, None, None, ""],
-        inputs=[],
-        outputs=[output_graph, output_entity_viz, output_json, stats_output]
-    )
-# Launch the app
-demo.launch(share=False)

requirements.txt CHANGED Viewed

@@ -1,10 +1,12 @@
 python-dotenv
 gradio
-transformers==4.45.2
-python-dotenv
 accelerate
 python-rapidjson
 spaces
 pyvis
 networkx
 spacy

 python-dotenv
 gradio
+transformers
 accelerate
 python-rapidjson
 spaces
 pyvis
 networkx
 spacy
+numpy
+lightrag-hku
+openai