phi-knowledge-graph

Running on Zero

App Files Files Community

phi-knowledge-graph / CLAUDE.md

vietexob

Major bug fixed

a110d08 18 days ago

preview code

raw

history blame contribute delete

3.97 kB

A newer version of the Gradio SDK is available: 5.49.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Application Overview

This is a Text2Graph application that extracts knowledge graphs from natural language text. It's a Gradio web app that uses either OpenAI GPT-4.1-mini via Azure or Phi-3-mini-128k-instruct-graph via Hugging Face to extract entities and relationships from text, then visualizes them as interactive graphs.

Architecture

app.py: Main Gradio application with UI components, visualization logic, and caching
llm_graph.py: Core LLMGraph class that handles model selection and knowledge graph extraction
visualize.py: Standalone script for visualizing GraphML files from LightRAG output
data/: Contains sample texts in multiple languages and system prompt templates
cache/: Directory for caching visualization data (first example is pre-cached for performance)
sample/: Working directory for LightRAG processing, cleared and recreated on each run

Key Components

LLMGraph Class (llm_graph.py)

Supports two model backends: Azure OpenAI (GPT-4.1-mini) and Hugging Face (Phi-3-mini-128k-instruct-graph)
Uses LightRAG for Azure OpenAI integration
Direct inference API calls for Hugging Face models
Extracts structured JSON with nodes (entities) and edges (relationships)

Visualization Pipeline (app.py)

Entity recognition visualization using spaCy's displacy
Interactive knowledge graph using pyvis and NetworkX
Caching system for performance optimization
Color-coded entity types with random light colors

Environment Setup

Required environment variables:

HF_TOKEN=<huggingface_token>
HF_API_ENDPOINT=<huggingface_inference_endpoint>
AZURE_OPENAI_API_KEY=<azure_openai_key>
AZURE_OPENAI_ENDPOINT=<azure_endpoint>
AZURE_OPENAI_API_VERSION=<api_version>
AZURE_OPENAI_DEPLOYMENT=<deployment_name>
AZURE_EMBEDDING_DEPLOYMENT=<embedding_deployment>
AZURE_EMBEDDING_API_VERSION=<embedding_api_version>

Development Commands

# Install dependencies
pip install -r requirements.txt

# Run the Gradio app locally (default port 7860)
python app.py

# Test model extraction directly
python llm_graph.py

# Visualize existing GraphML files (requires sample/ directory with GraphML file)
python visualize.py

Key Dependencies

gradio: Web interface framework
lightrag-hku: RAG framework for Azure OpenAI integration
transformers: Hugging Face model integration
pyvis: Interactive network visualization
networkx: Graph data structure and algorithms
spacy: Natural language processing and entity visualization
openai: Azure OpenAI client

Data Flow

User inputs text and selects model
LLMGraph.extract() processes text using selected model backend
JSON response contains nodes (entities) and edges (relationships)
Visualization functions create entity highlighting and interactive graph
Results cached for performance (first example only)

Model Behavior

The application expects JSON output with this schema:

{
  "nodes": [{"id": "entity", "type": "broad_type", "detailed_type": "specific_type"}],
  "edges": [{"from": "entity1", "to": "entity2", "label": "relationship"}]
}

Testing and Development Notes

No formal test suite exists; manual testing through the Gradio interface
First example is automatically cached for performance on startup
Cache files stored in cache/ directory as pickle files
Working directory sample/ is cleared and recreated on each run
GraphML files generated by LightRAG for Azure OpenAI model backend

Environment Configuration

Uses .env file for API keys and endpoints (see Environment Setup section)
Designed for Hugging Face Spaces deployment (see README.md frontmatter)
SpaCy model loading is handled automatically by the application
No additional configuration files (package.json, pyproject.toml, etc.) required