Spaces:

semmyk
/

semmyKG

Running

semmyKG / README.md

v0.2.8.6: Baseline 03 - fix require einops for nomic - update Gradio embed components, add queue - update README - attempt fixing GenAI role:assistant with modify_history_in_place()

a50f7ce 10 days ago

preview code

raw

history blame contribute delete

4.73 kB

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

metadata

title: semmyKG - Knowledge Graph visualiser toolkit (builder from markdown)
emoji: 🕸️
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.44.1
python_version: 3.12
app_file: app.py
hf_oauth: true
oauth_scopes:
  - read-access
hf_oauth_scopes:
  - inference-api
license: mit
pinned: true
short_description: semmyKG - Knowledge Graph toolkit |
models:
  - meta-llama/Llama-4-Maverick-17B-128E-Instruct
  - openai/gpt-oss-120b, openai/gpt-oss-20b
tags:
  - knowledge graph
  - markdown
  - RAG
  - domain
owner: research-semmyk
version: 0.2.8.6
readme: README.md
requires-python: '>=3.12'

LightRAG Gradio App

A modern, modular Gradio app for knowledge graph-based Retrieval-Augmented Generation (RAG) using LightRAG. Supports OpenAI and Ollama LLM backends, markdown document ingestion, and interactive knowledge graph visualisation. Our ParserPDF (GitHub | HF Space) pipeline generate markdown from documents (pdf, Word, html).

Features

LightRAG for Dual-level RAG and knowledge graph (KG)
Ingest markdown files from a folder (default: dataset/data/docs).
Query with OpenAI or Ollama backend (user-selectable)
Visualise KG interactively in-browser
Deployable to venv, Colab, or HuggingFace Spaces
Robust, pythonic, modular code (UK English)

Setup

1. Clone and create venv

git clone https://github.com/semmyk-research/semmyKG
cd semmyKG

uv venv .venv              # ensure you have the uv package
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
uv pip sync                # or uv pip sync requirements.txt

or 
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -r requirements.txt

2. Configure environment

Copy .env.example to .env and fill in your keys:

OPENAI_API_KEY=your-openai-api-key
LLM_MODEL=your-LLM-model-Name 
    ##(in the format: provider/model-identifier)
OPENAI_API_BASE=your-LLM-inference-provider-endpoint 
    ##(for locally hosted llm inference server like LMStudio or Jan.ai, follow ollama host adding /v1: http://localhost:1234/v1)
OPENAI_API_EMBED_BASE=your-embedding-provider-endpoint 
    ##(for locally hosted, do not include /embedding)
LLM_MODEL_EMBED=your-embedding-model  ##(in the format: provider/embedding-name)
OLLAMA_HOST=http://localhost:11434
OLLAMA_API_KEY=  ##(include if required)

If .env is not set, you can enter into the web UI directly.
Ditto, override .env by inputting directly in web UI.

3. Run the app

python app_gradio_lightrag.py

For 'faster' development 'debug'

##SMY: assist: https://www.gradio.app/guides/developing-faster-with-reload-mode
gradio app_gradio_lightrag.py --demo-name=gradio_ui

4. Colab/Spaces

For HuggingFace Spaces: ensure all dependencies are in requirements.txt and .env is set via the web UI or Space secret.
For Colab: install requirements and run the app cell.

Usage

Browse/Select your data folder (default: dataset/data/docs)
Choose LLM backend (OpenAI or Ollama). [fix: GenAI has a bug yieling error: role:'assistant' instead of 'user' when updating history].
Activate the RAG constructor
Click 'Index Documents' to build the KG entities
Click 'Query' to get answers -- Enter your query and select query mode
Click 'Show Knowledge Graph' to visualise the KG

NB: If using HuggingFace, log in first before browsing/selecting/uploading files and setting LLM parameters.

Notes

Only markdown files are supported for ingestion (images in /images subfolder are ignored for now).
NB: other formats will be enabled later: pdf, txt, html...
To generate markdown from documents (PDf, Word, html), use our ParserPDF tool GitHub | HF Space.
All user-facing text is in UK English
For advanced configuration, see LightRAG documentation

Roadmap (no defined timeline)

HuggingFace log in
ParserPDF integration

License

MIT