File size: 10,623 Bytes
6cb9473 941e8f6 6cb9473 941e8f6 6cb9473 941e8f6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 | ---
title: Semantic Diffing for Evolving Knowledge Graphs
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---
# Semantic Diffing for Evolving Knowledge Graphs
A system for tracking structural changes in knowledge graphs as documents evolve over time. This project extracts entities and relationships from multiple document versions, constructs graph representations, and identifies semantic differences such as added or removed entities and relationships.
The system enables comparison between document snapshots and generates both structured graph diffs and natural-language summaries of detected changes.
---
# Overview
Knowledge graphs evolve as new information becomes available. Tracking changes between versions is critical in domains such as enterprise knowledge management, legal systems, compliance workflows, and technical documentation.
This project implements:
* Entity and relationship extraction from document versions
* Knowledge graph construction using NetworkX
* Graph-level semantic diffing
* Identification of added and removed nodes and edges
* Natural-language summarization of detected changes
---
# Key Features
* Extract entities and relationships from document text
* Build graph representations for multiple document versions
* Compare knowledge graph snapshots
* Detect added entities
* Detect removed entities
* Detect added relationships
* Detect removed relationships
* Generate structured graph diffs
* Produce natural-language summaries of changes
* Visualize knowledge graph snapshots
---
# Frontend
The project ships with a full interactive, animated frontend at `frontend/index.html`, served directly by `app.py`. It includes:
* A live-diff demo that runs instantly against bundled sample data β no API key needed to try it
* An optional advanced panel for entering a Groq API key to run the diff live against `/api/diff`
* Side-by-side force-directed graph views (D3) of the two knowledge graph versions, color-coded to match `graph_utils.py`'s own diff palette
* A terminal-style animated diff console rendering added/removed/unchanged entities and relations
* A walkthrough of the 5-stage pipeline, architecture breakdown, use cases, and roadmap
To use it, just run `python app.py` and open `http://localhost:5050`.
---
# How It Works
1. Upload two document versions:
* Baseline document (v1)
* Updated document (v2)
2. Each document is processed independently:
* Text is parsed
* Entities are extracted
* Relationships are extracted
3. A knowledge graph is created for each version.
4. Graph diffing identifies:
* New entities
* Removed entities
* New relationships
* Removed relationships
5. A natural-language summary describes the detected changes.
---
# System Architecture
```text
βββββββββββββββββββββββ
β Document v1 β
β (Baseline) β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Entity & Relation β
β Extraction (LLM) β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Knowledge Graph v1 β
βββββββββββββββββββββββ
βββββββββββββββββββββββ
β Document v2 β
β (Updated) β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Entity & Relation β
β Extraction (LLM) β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Knowledge Graph v2 β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Graph Diff Engine β
β - Added Nodes β
β - Removed Nodes β
β - Added Edges β
β - Removed Edges β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Change Summary β
β Natural Language β
βββββββββββββββββββββββ
```
---
# Installation
Clone the repository:
```bash
git clone https://github.com/your-username/semantic_diffing.git
cd semantic_diffing
```
Install dependencies:
```bash
pip install -r requirements.txt
```
Set the Groq API key:
Linux / macOS:
```bash
export GROQ_API_KEY=your_key_here
```
Windows:
```powershell
set GROQ_API_KEY=your_key_here
```
Run the application:
```bash
python app.py
```
Then open `http://localhost:5050` in your browser. This serves a full interactive frontend β including an animated live-diff demo, force-directed graph views, and a sample dataset that works out of the box even without an API key.
---
# Input Format
Supported formats:
* `.txt` documents
Two versions are required:
* Baseline document (v1)
* Updated document (v2)
---
# Sample Data
Sample documents are included in the `data/` directory:
* `doc_v1.txt`
Baseline version of a fictional company description.
* `doc_v2.txt`
Updated version containing new entities and relationships.
These files allow quick testing of semantic diffing functionality.
---
# Project Structure
```text
semantic_diffing/
β
βββ app.py
β Flask entry point β serves the frontend and the /api/diff endpoint
β
βββ semantic_diff.py
β Entity and relationship extraction
β Graph diff computation
β
βββ graph_utils.py
β NetworkX graph construction
β Graph visualization
β
βββ frontend/
β βββ index.html
β β Full interactive single-page frontend
β βββ static/
β βββ css/style.css
β βββ js/app.js
β β Animation, demo orchestration, D3 graph rendering
β βββ js/demo-data.js
β Bundled offline fixture so the demo works without an API key
β
βββ data/
β βββ doc_v1.txt
β βββ doc_v2.txt
β
βββ requirements.txt
β Python dependencies
β
βββ README.md
```
---
# Core Modules
## semantic_diff.py
Responsible for:
* LLM-based entity extraction
* Relationship extraction
* Graph comparison logic
* Detection of semantic differences
* Generation of change summaries
Key operations:
* Extract entities
* Extract relationships
* Compute node differences
* Compute edge differences
* Generate structured diff output
---
## graph_utils.py
Responsible for:
* Building knowledge graphs using NetworkX
* Representing entities as nodes
* Representing relationships as edges
* Visualizing graph snapshots
* Highlighting added and removed elements
---
## app.py
Acts as the main execution script.
Responsible for:
* Loading document versions
* Triggering extraction pipeline
* Building graphs
* Running diff computation
* Displaying outputs
---
# Example Output (Graph Diff JSON)
```json
{
"added_entities": [
"AI Research Division",
"Cloud Infrastructure Team"
],
"removed_entities": [
"Legacy Systems Department"
],
"added_relationships": [
{
"source": "ABC Corporation",
"relation": "launched",
"target": "AI Research Division"
}
],
"removed_relationships": [
{
"source": "ABC Corporation",
"relation": "maintains",
"target": "Legacy Systems Department"
}
]
}
```
---
# Example LLM Extraction Prompt
```text
You are an information extraction system.
Extract structured entities and relationships from the text.
Return output in JSON format using:
{
"entities": [],
"relationships": []
}
Rules:
1. Entities should represent meaningful objects such as:
- Organizations
- Departments
- Products
- Teams
- Locations
2. Relationships should represent interactions between entities.
Text:
{DOCUMENT_TEXT}
```
---
# Example Diff Summary
```text
Changes detected between document versions:
- Two new entities were introduced: AI Research Division and Cloud Infrastructure Team.
- One entity was removed: Legacy Systems Department.
- A new relationship was added linking ABC Corporation to AI Research Division.
- A maintenance relationship with Legacy Systems Department was removed.
```
---
# Technologies Used
* Python
* NetworkX
* Matplotlib
* Large Language Models (LLMs)
* Groq API
* Natural Language Processing (NLP)
* Graph Theory
---
# Design Considerations
* Separate graphs are built per document version.
* Diffing operates at both node and edge levels.
* Structured outputs enable downstream analytics.
* Modular design allows extension to multi-version comparison.
---
# Limitations
* Extraction accuracy depends on LLM output quality.
* Large graphs may increase visualization complexity.
* Relationship normalization may require domain tuning.
* Currently supports two-version comparison only.
---
# Future Improvements
* Multi-version timeline diffing
* Graph history tracking
* Knowledge graph persistence
* Interactive graph exploration
* Graph database integration (Neo4j)
* Graph embedding similarity metrics
* Change severity scoring
* Support for additional document formats
---
# Use Cases
This system can be applied to:
* Enterprise knowledge tracking
* Policy change monitoring
* Technical documentation updates
* Compliance auditing
* Legal contract version comparison
* Organizational change tracking
* Knowledge management systems
|