Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.49.1
metadata
title: semmyKG - Knowledge Graph visualiser toolkit (builder from markdown)
emoji: 🕸️
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
python_version: 3.12
app_file: app.py
hf_oauth: true
oauth_scopes:
- read-access
hf_oauth_scopes:
- inference-api
license: mit
pinned: true
short_description: semmyKG - Knowledge Graph toolkit |
models:
- meta-llama/Llama-4-Maverick-17B-128E-Instruct
- openai/gpt-oss-120b, openai/gpt-oss-20b
tags:
- knowledge graph
- markdown
- RAG
- domain
owner: research-semmyk
version: 0.2.8.6
readme: README.md
requires-python: '>=3.12'
LightRAG Gradio App
A modern, modular Gradio app for knowledge graph-based Retrieval-Augmented Generation (RAG) using LightRAG. Supports OpenAI and Ollama LLM backends, markdown document ingestion, and interactive knowledge graph visualisation. Our ParserPDF (GitHub | HF Space) pipeline generate markdown from documents (pdf, Word, html).
Features
- LightRAG for Dual-level RAG and knowledge graph (KG)
- Ingest markdown files from a folder (default:
dataset/data/docs). - Query with OpenAI or Ollama backend (user-selectable)
- Visualise KG interactively in-browser
- Deployable to venv, Colab, or HuggingFace Spaces
- Robust, pythonic, modular code (UK English)
Setup
1. Clone and create venv
git clone https://github.com/semmyk-research/semmyKG
cd semmyKG
uv venv .venv # ensure you have the uv package
source .venv/bin/activate # or .venv\Scripts\activate on Windows
uv pip sync # or uv pip sync requirements.txt
or
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
2. Configure environment
Copy .env.example to .env and fill in your keys:
OPENAI_API_KEY=your-openai-api-key
LLM_MODEL=your-LLM-model-Name
##(in the format: provider/model-identifier)
OPENAI_API_BASE=your-LLM-inference-provider-endpoint
##(for locally hosted llm inference server like LMStudio or Jan.ai, follow ollama host adding /v1: http://localhost:1234/v1)
OPENAI_API_EMBED_BASE=your-embedding-provider-endpoint
##(for locally hosted, do not include /embedding)
LLM_MODEL_EMBED=your-embedding-model ##(in the format: provider/embedding-name)
OLLAMA_HOST=http://localhost:11434
OLLAMA_API_KEY= ##(include if required)
If .env is not set, you can enter into the web UI directly.
Ditto, override .env by inputting directly in web UI.
3. Run the app
python app_gradio_lightrag.py
For 'faster' development 'debug'
##SMY: assist: https://www.gradio.app/guides/developing-faster-with-reload-mode
gradio app_gradio_lightrag.py --demo-name=gradio_ui
4. Colab/Spaces
- For HuggingFace Spaces: ensure all dependencies are in
requirements.txtand.envis set via the web UI or Space secret. - For Colab: install requirements and run the app cell.
Usage
- Browse/Select your data folder (default:
dataset/data/docs) - Choose LLM backend (OpenAI or Ollama). [fix: GenAI has a bug yieling error: role:'assistant' instead of 'user' when updating history].
- Activate the RAG constructor
- Click 'Index Documents' to build the KG entities
- Click 'Query' to get answers -- Enter your query and select query mode
- Click 'Show Knowledge Graph' to visualise the KG
NB: If using HuggingFace, log in first before browsing/selecting/uploading files and setting LLM parameters.
Notes
- Only markdown files are supported for ingestion (images in
/imagessubfolder are ignored for now).
NB: other formats will be enabled later: pdf, txt, html... - To generate markdown from documents (PDf, Word, html), use our ParserPDF tool GitHub | HF Space.
- All user-facing text is in UK English
- For advanced configuration, see LightRAG documentation
Roadmap (no defined timeline)
- HuggingFace log in
- ParserPDF integration