title: Semantic Diffing for Evolving Knowledge Graphs
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
Semantic Diffing for Evolving Knowledge Graphs
A system for tracking structural changes in knowledge graphs as documents evolve over time. This project extracts entities and relationships from multiple document versions, constructs graph representations, and identifies semantic differences such as added or removed entities and relationships.
The system enables comparison between document snapshots and generates both structured graph diffs and natural-language summaries of detected changes.
Overview
Knowledge graphs evolve as new information becomes available. Tracking changes between versions is critical in domains such as enterprise knowledge management, legal systems, compliance workflows, and technical documentation.
This project implements:
- Entity and relationship extraction from document versions
- Knowledge graph construction using NetworkX
- Graph-level semantic diffing
- Identification of added and removed nodes and edges
- Natural-language summarization of detected changes
Key Features
- Extract entities and relationships from document text
- Build graph representations for multiple document versions
- Compare knowledge graph snapshots
- Detect added entities
- Detect removed entities
- Detect added relationships
- Detect removed relationships
- Generate structured graph diffs
- Produce natural-language summaries of changes
- Visualize knowledge graph snapshots
Frontend
The project ships with a full interactive, animated frontend at frontend/index.html, served directly by app.py. It includes:
- A live-diff demo that runs instantly against bundled sample data β no API key needed to try it
- An optional advanced panel for entering a Groq API key to run the diff live against
/api/diff - Side-by-side force-directed graph views (D3) of the two knowledge graph versions, color-coded to match
graph_utils.py's own diff palette - A terminal-style animated diff console rendering added/removed/unchanged entities and relations
- A walkthrough of the 5-stage pipeline, architecture breakdown, use cases, and roadmap
To use it, just run python app.py and open http://localhost:5050.
How It Works
Upload two document versions:
- Baseline document (v1)
- Updated document (v2)
Each document is processed independently:
- Text is parsed
- Entities are extracted
- Relationships are extracted
A knowledge graph is created for each version.
Graph diffing identifies:
- New entities
- Removed entities
- New relationships
- Removed relationships
A natural-language summary describes the detected changes.
System Architecture
βββββββββββββββββββββββ
β Document v1 β
β (Baseline) β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Entity & Relation β
β Extraction (LLM) β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Knowledge Graph v1 β
βββββββββββββββββββββββ
βββββββββββββββββββββββ
β Document v2 β
β (Updated) β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Entity & Relation β
β Extraction (LLM) β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Knowledge Graph v2 β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Graph Diff Engine β
β - Added Nodes β
β - Removed Nodes β
β - Added Edges β
β - Removed Edges β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Change Summary β
β Natural Language β
βββββββββββββββββββββββ
Installation
Clone the repository:
git clone https://github.com/your-username/semantic_diffing.git
cd semantic_diffing
Install dependencies:
pip install -r requirements.txt
Set the Groq API key:
Linux / macOS:
export GROQ_API_KEY=your_key_here
Windows:
set GROQ_API_KEY=your_key_here
Run the application:
python app.py
Then open http://localhost:5050 in your browser. This serves a full interactive frontend β including an animated live-diff demo, force-directed graph views, and a sample dataset that works out of the box even without an API key.
Input Format
Supported formats:
.txtdocuments
Two versions are required:
- Baseline document (v1)
- Updated document (v2)
Sample Data
Sample documents are included in the data/ directory:
doc_v1.txtBaseline version of a fictional company description.doc_v2.txtUpdated version containing new entities and relationships.
These files allow quick testing of semantic diffing functionality.
Project Structure
semantic_diffing/
β
βββ app.py
β Flask entry point β serves the frontend and the /api/diff endpoint
β
βββ semantic_diff.py
β Entity and relationship extraction
β Graph diff computation
β
βββ graph_utils.py
β NetworkX graph construction
β Graph visualization
β
βββ frontend/
β βββ index.html
β β Full interactive single-page frontend
β βββ static/
β βββ css/style.css
β βββ js/app.js
β β Animation, demo orchestration, D3 graph rendering
β βββ js/demo-data.js
β Bundled offline fixture so the demo works without an API key
β
βββ data/
β βββ doc_v1.txt
β βββ doc_v2.txt
β
βββ requirements.txt
β Python dependencies
β
βββ README.md
Core Modules
semantic_diff.py
Responsible for:
- LLM-based entity extraction
- Relationship extraction
- Graph comparison logic
- Detection of semantic differences
- Generation of change summaries
Key operations:
- Extract entities
- Extract relationships
- Compute node differences
- Compute edge differences
- Generate structured diff output
graph_utils.py
Responsible for:
- Building knowledge graphs using NetworkX
- Representing entities as nodes
- Representing relationships as edges
- Visualizing graph snapshots
- Highlighting added and removed elements
app.py
Acts as the main execution script.
Responsible for:
- Loading document versions
- Triggering extraction pipeline
- Building graphs
- Running diff computation
- Displaying outputs
Example Output (Graph Diff JSON)
{
"added_entities": [
"AI Research Division",
"Cloud Infrastructure Team"
],
"removed_entities": [
"Legacy Systems Department"
],
"added_relationships": [
{
"source": "ABC Corporation",
"relation": "launched",
"target": "AI Research Division"
}
],
"removed_relationships": [
{
"source": "ABC Corporation",
"relation": "maintains",
"target": "Legacy Systems Department"
}
]
}
Example LLM Extraction Prompt
You are an information extraction system.
Extract structured entities and relationships from the text.
Return output in JSON format using:
{
"entities": [],
"relationships": []
}
Rules:
1. Entities should represent meaningful objects such as:
- Organizations
- Departments
- Products
- Teams
- Locations
2. Relationships should represent interactions between entities.
Text:
{DOCUMENT_TEXT}
Example Diff Summary
Changes detected between document versions:
- Two new entities were introduced: AI Research Division and Cloud Infrastructure Team.
- One entity was removed: Legacy Systems Department.
- A new relationship was added linking ABC Corporation to AI Research Division.
- A maintenance relationship with Legacy Systems Department was removed.
Technologies Used
- Python
- NetworkX
- Matplotlib
- Large Language Models (LLMs)
- Groq API
- Natural Language Processing (NLP)
- Graph Theory
Design Considerations
- Separate graphs are built per document version.
- Diffing operates at both node and edge levels.
- Structured outputs enable downstream analytics.
- Modular design allows extension to multi-version comparison.
Limitations
- Extraction accuracy depends on LLM output quality.
- Large graphs may increase visualization complexity.
- Relationship normalization may require domain tuning.
- Currently supports two-version comparison only.
Future Improvements
- Multi-version timeline diffing
- Graph history tracking
- Knowledge graph persistence
- Interactive graph exploration
- Graph database integration (Neo4j)
- Graph embedding similarity metrics
- Change severity scoring
- Support for additional document formats
Use Cases
This system can be applied to:
- Enterprise knowledge tracking
- Policy change monitoring
- Technical documentation updates
- Compliance auditing
- Legal contract version comparison
- Organizational change tracking
- Knowledge management systems