Bani57 commited on
Commit
532276a
·
verified ·
1 Parent(s): e544e35

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +118 -0
README.md ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: pytorch
6
+ tags:
7
+ - knowledge-graph
8
+ - link-prediction
9
+ - query-answering
10
+ - graph-generation
11
+ - graph-diffusion
12
+ - knowledge-graph-completion
13
+ - phd-thesis
14
+ - epfl
15
+ datasets:
16
+ - FB15k-237
17
+ - WN18RR
18
+ - NELL-995
19
+ - QM9
20
+ ---
21
+
22
+ # PhD research checkpoints — Andrej Janchevski (EPFL, 2025)
23
+
24
+ PyTorch checkpoint dump for the three research methods presented in the thesis
25
+ _Scalable Methods for Knowledge Graph Reasoning and Generation_
26
+ ([infoscience.epfl.ch](https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c)).
27
+ The repository mirrors the on-disk layout the demo backend expects, so a single
28
+ `huggingface_hub.snapshot_download(repo_id="Bani57/checkpoints", local_dir=...)`
29
+ drops every file into its final location with no extra wiring.
30
+
31
+ The interactive demos that consume these weights are deployed at
32
+ <https://bani57-website.hf.space>; source at
33
+ <https://huggingface.co/spaces/Bani57/website>.
34
+
35
+ ## Methods and weights
36
+
37
+ ### COINs — knowledge graph reasoning (thesis §3.1)
38
+ *Community-Informed Graph Embeddings.* Six embedding scoring families
39
+ (TransE, DistMult, ComplEx, RotatE, Q2B, KBGAT) trained on three KGs.
40
+ Partitions each KG into Leiden communities and learns separate
41
+ community-local and global embeddings, combined at scoring time.
42
+
43
+ `COINs-KGGeneration/graph_completion/checkpoints/{dataset}_{algorithm}.tar`
44
+ — 18 files, ~2.6 GB.
45
+ Datasets: `freebase` (FB15k-237), `wordnet` (WN18RR), `nell` (NELL-995).
46
+ Algorithms: `transe`, `distmult`, `complex`, `rotate`, `q2b`, `kbgat`.
47
+
48
+ `COINs-KGGeneration/graph_completion/results/{dataset}/transe_model.tar`
49
+ — 3 files, ~185 MB.
50
+ TransE pre-init checkpoints used to bootstrap the KBGAT embedder.
51
+
52
+ ### MultiProxAn — graph generation (thesis §4.3)
53
+ Discrete denoising diffusion model with the *MultiProx* outer Gibbs loop for
54
+ multi-chain refinement. Generates molecular graphs (QM9) and synthetic
55
+ community graphs (comm20).
56
+
57
+ `MultiProxAn/checkpoints/{dataset}{,_c}.ckpt`
58
+ — 4 files, ~380 MB.
59
+ Discrete (`{dataset}.ckpt`) and continuous (`{dataset}_c.ckpt`) variants.
60
+
61
+ ### KG anomaly correction (thesis §4.4)
62
+ DiGress-style diffusion conditioned on the COINs embedder for the same
63
+ dataset. Either samples a fresh subgraph (`generate`) or denoises a
64
+ user-supplied subgraph (`correct`).
65
+
66
+ `COINs-KGGeneration/graph_generation/checkpoints/{dataset}{,_correct}.ckpt`
67
+ — 6 files, ~2.7 GB.
68
+
69
+ ## Usage
70
+
71
+ The deployed website downloads the entire repository into its
72
+ `CHECKPOINTS_ROOT` at container startup:
73
+
74
+ ```python
75
+ from huggingface_hub import snapshot_download
76
+ snapshot_download(
77
+ repo_id="Bani57/checkpoints",
78
+ repo_type="model",
79
+ local_dir="src/research", # mirrors the on-disk layout
80
+ local_dir_use_symlinks=False,
81
+ )
82
+ ```
83
+
84
+ For accelerated downloads, install `hf_transfer` and set
85
+ `HF_HUB_ENABLE_HF_TRANSFER=1`. Total payload ≈ 5.8 GB.
86
+
87
+ The weights are loaded by [`ModelRegistry`](https://huggingface.co/spaces/Bani57/website/blob/main/src/backend/api/services/registry.py)
88
+ in the website backend; lazy per-request loading keeps the working set small.
89
+
90
+ ## Training
91
+
92
+ The COINs and MultiProxAn checkpoints were trained on EPFL's GPU cluster
93
+ during 2021–2025 as part of the doctoral research programme. Training
94
+ hyperparameters live in the
95
+ [research code's YAML configs](https://huggingface.co/spaces/Bani57/website/tree/main/src/research/COINs-KGGeneration/graph_completion/configs).
96
+
97
+ ## Intended use
98
+
99
+ These checkpoints are released to power the interactive thesis demos
100
+ linked above. They are research artefacts; downstream production use is
101
+ neither tested nor supported.
102
+
103
+ ## Citation
104
+
105
+ ```bibtex
106
+ @phdthesis{janchevski_scalable_2025,
107
+ author = {Andrej Janchevski},
108
+ title = {Scalable Methods for Knowledge Graph Reasoning and Generation},
109
+ school = {{EPFL}},
110
+ year = {2025},
111
+ url = {https://infoscience.epfl.ch/entities/publication/87acf391-feef-43a0-b665-7f2f0bc70b2c},
112
+ }
113
+ ```
114
+
115
+ ## License
116
+
117
+ MIT for the released weights and source. The research methods retain
118
+ their original publication terms; see the thesis.