Spaces:

Bani57
/

website

Running

App Files Files Community

website / docs /glossary.md

Andrej Janchevski

docs(deploy): refresh for the post-launch deployment iteration

5ed6f37 12 days ago

preview code

raw

history blame contribute delete

5.53 kB

Glossary

Domain terms used throughout the codebase and documentation. Other documents reference this file rather than redefining terms in place.

Knowledge graph reasoning

Knowledge graph (KG)

A directed multigraph of (head, relation, tail) triples where vertices are entities and labelled edges are relations. The three KGs exposed by the site are FB15k-237 (Freebase subset), WN18RR (WordNet subset) and NELL-995.

Link prediction

Given two of head, relation, tail, score and rank candidates for the missing slot. The 1-projection (1p) query structure is link prediction.

Query structure

A multi-hop / intersection / projection query template over a KG. Supported structures: 1p, 2p, 3p (single chain projections), 2i, 3i (intersection of two/three relations), ip (intersection then projection), pi (projection then intersection). Templates determine which slots — anchor entities (a, a1, a2, …), variable entities (v1, v2) and relations (r1, r2, r3) — the user fills in.

COINs

Community-Informed Graph Embeddings, the link-prediction / query-answering approach from PhD thesis section 3.1. Partitions the KG into communities via Leiden clustering, learns separate community-local and global embeddings, and combines them at scoring time. Reduces compute relative to full-graph methods on large KGs.

Leiden clustering

A community-detection algorithm refining Louvain. The leiden_resolution parameter trades community count against community size; the configured per-dataset resolutions are stable across all COINs algorithms for that dataset.

Algorithm (COINs context)

Embedding scoring family. Supported: TransE, DistMult, ComplEx, RotatE (translation/bilinear/complex), Q2B (Query2Box for hyper-rectangles, supports box queries), KBGAT (graph-attention message passing). Each algorithm declares which query_structures it can answer.

Graph generation

MultiProxAn

The graph-generation method from PhD thesis section 4.3. A discrete denoising diffusion model — DiGress-style — augmented with MultiProx, an outer loop over multiple noisy initializations sampled jointly. The Gibbs inner step refines the current graph against several samples, raising sample quality on small graphs (e.g. QM9 molecules).

DiGress

The base discrete denoising diffusion architecture for graphs. Forward process noises a graph by category permutation; the model learns to reverse the process step by step.

Sampling mode

Either standard (one denoising chain to a single output) or multiprox (the outer Gibbs loop wraps several chains). MultiProx adds the parameters n (chains), m (Gibbs rounds), t and t_prime (intermediate timesteps), and gibbs_chain_freq (preview cadence).

Discrete vs. continuous

Two model variants per dataset. Discrete predicts categorical distributions over node/edge types directly; continuous predicts in a relaxed continuous space and rounds at the end. Checkpoints are named {dataset}.ckpt (discrete) and {dataset}_c.ckpt (continuous).

KG anomaly correction

Subgraph

A small (≤ 20-node) connected sample drawn from a COINs Loader's DFS context-subgraph partitioning. Used as input/output for the KG anomaly demo.

Task (kg-anomaly)

Either generate (sample a fresh subgraph from noise) or correct (denoise a user-supplied subgraph back toward something the model considers plausible). Each (dataset, task) pair has its own checkpoint.

Bipartite vs. unipartite subgraph

The DFS partitioner emits both: bipartite subgraphs split nodes into two halves with edges across, unipartite subgraphs are a single connected blob. The frontend renders them differently.

Inference protocol

Inference lock

A single threading.Lock in ModelRegistry. Only one inference runs at a time across the whole process (free HF Spaces is 2 vCPU, no GPU); a busy server returns HTTP 429 (INFERENCE_BUSY). /api/v1/debug/force-unlock releases a stuck lock when DEBUG=True.

SSE (Server-Sent Events)

The streaming-inference protocol the graph-generation and kg-anomaly endpoints use. Each event has a type (progress | preview | result) and a JSON payload. See reference/sse-protocol.md.

Continuation token / state blob

Multiprox sampling can pause between Gibbs rounds. The result event of a /generate or /correct call returns a base64-encoded state blob; the client posts that blob to the matching /continue endpoint to advance one more round.

Inference lifecycle

Boot-time: pre-warm checkpoints from HF Hub, scan checkpoint dirs, load lightweight COINs Loaders, generate sample subgraphs. First-request: lazy-load the relevant model weights into memory. See explanation/inference-lifecycle.md.

Deployment

HF Space

A Hugging Face Spaces application running this repo's Dockerfile. The deployed URL is https://bani57-website.hf.space. The Space repo is Bani57/website.

HF Hub model repo

Bani57/checkpoints — holds all PyTorch weights. Mirrors the on-disk layout under CHECKPOINTS_ROOT so huggingface_hub.snapshot_download populates files in their expected paths and the registry's scan logic finds them unchanged.

Persistent storage (HF Spaces)

A paid /data volume that survives Space restarts. Free Spaces have 50 GB ephemeral disk that resets on restart. Without persistent storage, every cold start re-downloads checkpoints from HF Hub.