textgraphs / docs /ex1_0.md
Paco Nathan
A new start
91eaff6
!!! note
To run this notebook in JupyterLab, load [`examples/ex1_0.ipynb`](https://github.com/DerwenAI/textgraphs/blob/main/examples/ex1_0.ipynb)
# reproduce results from the "InGram" paper
This is an attempt to reproduce the _graph of relations_ example given in `lee2023ingram`
## environment
```python
import os
import pathlib
import typing
from icecream import ic
from pyinstrument import Profiler
import matplotlib.pyplot as plt
import pandas as pd
import pyvis
import textgraphs
```
```python
%load_ext watermark
```
```python
%watermark
```
Last updated: 2024-01-16T17:35:45.550539-08:00
Python implementation: CPython
Python version : 3.10.11
IPython version : 8.20.0
Compiler : Clang 13.0.0 (clang-1300.0.29.30)
OS : Darwin
Release : 21.6.0
Machine : x86_64
Processor : i386
CPU cores : 8
Architecture: 64bit
```python
%watermark --iversions
```
matplotlib: 3.8.2
pandas : 2.1.4
pyvis : 0.3.2
textgraphs: 0.5.0
sys : 3.10.11 (v3.10.11:7d4cc5aa85, Apr 4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)]
## load example graph
load from a JSON file which replicates the data for the "Figure 3" example
```python
graph: textgraphs.GraphOfRelations = textgraphs.GraphOfRelations(
textgraphs.SimpleGraph()
)
ingram_path: pathlib.Path = pathlib.Path(os.getcwd()) / "ingram.json"
graph.load_ingram(
ingram_path,
debug = False,
)
```
set up the statistical stack profiling
```python
profiler: Profiler = Profiler()
profiler.start()
```
## decouple graph edges into "seeds"
```python
graph.seeds(
debug = True,
)
```
--- triples in source graph ---
ic| edge.src_node: 0, rel_id: 1, edge.dst_node: 1
ic| edge.src_node: 0, rel_id: 0, edge.dst_node: 2
ic| edge.src_node: 0, rel_id: 0, edge.dst_node: 3
ic| edge.src_node: 4, rel_id: 2, edge.dst_node: 2
ic| edge.src_node: 4, rel_id: 2, edge.dst_node: 3
ic| edge.src_node: 4, rel_id: 1, edge.dst_node: 5
ic| edge.src_node: 6, rel_id: 1, edge.dst_node: 5
ic| edge.src_node: 6, rel_id: 2, edge.dst_node: 7
ic| edge.src_node: 6, rel_id: 4, edge.dst_node: 8
ic| edge.src_node: 9,
Steven_Spielberg Profession Director
Steven_Spielberg Directed Catch_Me_If_Can
Steven_Spielberg Directed Saving_Private_Ryan
Tom_Hanks ActedIn Catch_Me_If_Can
Tom_Hanks ActedIn Saving_Private_Ryan
Tom_Hanks Profession Actor
Mark_Hamil Profession Actor
Mark_Hamil ActedIn Star_Wars
Mark_Hamil BornIn California
rel_id: 5, edge.dst_node: 10
ic| edge.src_node: 9, rel_id: 4, edge.dst_node: 10
ic| edge.src_node: 9, rel_id: 3, edge.dst_node: 8
ic| edge.src_node: 11, rel_id: 4, edge.dst_node: 12
ic| edge.src_node: 11, rel_id: 3, edge.dst_node: 12
ic| edge.src_node: 11, rel_id: 3, edge.dst_node: 8
Brad_Pitt Nationality USA
Brad_Pitt BornIn USA
Brad_Pitt LivedIn California
Clint_Eastwood BornIn San_Francisco
Clint_Eastwood LivedIn San_Francisco
Clint_Eastwood LivedIn California
```python
graph.trace_source_graph()
```
--- nodes in source graph ---
n: 0, Steven_Spielberg
head: []
tail: [(0, 'Profession', 1), (0, 'Directed', 2), (0, 'Directed', 3)]
n: 1, Director
head: [(0, 'Profession', 1)]
tail: []
n: 2, Catch_Me_If_Can
head: [(0, 'Directed', 2), (4, 'ActedIn', 2)]
tail: []
n: 3, Saving_Private_Ryan
head: [(0, 'Directed', 3), (4, 'ActedIn', 3)]
tail: []
n: 4, Tom_Hanks
head: []
tail: [(4, 'ActedIn', 2), (4, 'ActedIn', 3), (4, 'Profession', 5)]
n: 5, Actor
head: [(4, 'Profession', 5), (6, 'Profession', 5)]
tail: []
n: 6, Mark_Hamil
head: []
tail: [(6, 'Profession', 5), (6, 'ActedIn', 7), (6, 'BornIn', 8)]
n: 7, Star_Wars
head: [(6, 'ActedIn', 7)]
tail: []
n: 8, California
head: [(6, 'BornIn', 8), (9, 'LivedIn', 8), (11, 'LivedIn', 8)]
tail: []
n: 9, Brad_Pitt
head: []
tail: [(9, 'Nationality', 10), (9, 'BornIn', 10), (9, 'LivedIn', 8)]
n: 10, USA
head: [(9, 'Nationality', 10), (9, 'BornIn', 10)]
tail: []
n: 11, Clint_Eastwood
head: []
tail: [(11, 'BornIn', 12), (11, 'LivedIn', 12), (11, 'LivedIn', 8)]
n: 12, San_Francisco
head: [(11, 'BornIn', 12), (11, 'LivedIn', 12)]
tail: []
--- edges in source graph ---
e: 0, Directed
e: 1, Profession
e: 2, ActedIn
e: 3, LivedIn
e: 4, BornIn
e: 5, Nationality
## construct a _graph of relations_
Transform the graph data into _graph of relations_
```python
graph.construct_gor(
debug = True,
)
```
ic| node_id: 0, len(seeds
--- transformed triples ---
): 3
ic| trans_arc: TransArc(pair_key=(0, 1),
a_rel=1,
b_rel=0,
node_id=0,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| trans_arc: TransArc(pair_key=(0, 1),
a_rel=1,
b_rel=0,
node_id=0,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| trans_arc: TransArc(pair_key=(0, 0),
a_rel=0,
b_rel=0,
node_id=0,
a_dir=<RelDir
(0, 1) Profession.tail Steven_Spielberg Directed.tail
(0, 1) Profession.tail Steven_Spielberg Directed.tail
(0, 0) Directed.tail Steven_Spielberg Directed.tail
.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| node_id: 1, len(seeds
): 1
ic| node_id: 2, len(seeds): 2
ic| trans_arc: TransArc(pair_key=(0, 2),
a_rel=0,
b_rel=2,
node_id=2,
a_dir=<RelDir.HEAD: 0>,
b_dir=<
(0, 2) Directed.head Catch_Me_If_Can ActedIn.head
RelDir.HEAD: 0>)
ic| node_id: 3, len(seeds): 2
ic| trans_arc: TransArc(pair_key=(0, 2),
a_rel=0,
b_rel=2,
node_id=3,
a_dir=<RelDir.HEAD: 0>,
b_dir=<RelDir.HEAD: 0>)
ic| node_id
(0, 2) Directed.head Saving_Private_Ryan ActedIn.head
: 4, len(seeds): 3
ic| trans_arc: TransArc(pair_key=(2, 2),
a_rel=2,
b_rel=2,
node_id=4,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| trans_arc: TransArc(pair_key=(1, 2),
a_rel=2,
b_rel=1,
node_id=4,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| trans_arc: TransArc(pair_key=(1, 2)
(2, 2) ActedIn.tail Tom_Hanks ActedIn.tail
(1, 2) ActedIn.tail Tom_Hanks Profession.tail
(1, 2) ActedIn.tail Tom_Hanks Profession.tail
,
a_rel=2,
b_rel=1,
node_id=4,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic|
node_id: 5, len(seeds): 2
ic| trans_arc: TransArc(pair_key=(1, 1),
a_rel=1,
b_rel=1,
(1, 1) Profession.head Actor Profession.head
node_id=5,
a_dir=<RelDir.HEAD: 0>,
b_dir=<RelDir.HEAD: 0>)
ic| node_id: 6, len(seeds): 3
ic| trans_arc: TransArc(pair_key=(1, 2),
a_rel=1,
b_rel=2,
node_id=6,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL:
(1, 2) Profession.tail Mark_Hamil ActedIn.tail
1>)
ic| trans_arc: TransArc(pair_key=(1, 4),
a_rel=1,
b_rel=4,
node_id=6,
a_dir
(1, 4) Profession.tail Mark_Hamil BornIn.tail
=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| trans_arc: TransArc(pair_key=(2, 4),
a_rel=2,
b_rel=4,
node_id=6,
(2, 4) ActedIn.tail Mark_Hamil BornIn.tail
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| node_id: 7, len(seeds): 1
ic| node_id: 8, len(seeds): 3
ic| trans_arc: TransArc(pair_key=(3, 4),
a_rel=4,
b_rel=3,
node_id=8,
a_dir=<RelDir.HEAD: 0>,
b_dir=<RelDir.HEAD:
(3, 4) BornIn.head California LivedIn.head
0>)
ic| trans_arc: TransArc(pair_key=(3, 4),
a_rel=4,
b_rel=3,
node_id=8,
a_dir=<RelDir.HEAD: 0>,
b_dir=<RelDir.HEAD: 0>)
ic| trans_arc: TransArc(pair_key=(3, 3),
a_rel=3,
b_rel=3,
node_id=8,
a_dir=<RelDir.HEAD: 0>,
b_dir=<RelDir.HEAD: 0>)
ic| node_id: 9, len(seeds): 3
ic
(3, 4) BornIn.head California LivedIn.head
(3, 3) LivedIn.head California LivedIn.head
(4, 5) Nationality.tail Brad_Pitt BornIn.tail
| trans_arc: TransArc(pair_key=(4, 5),
a_rel=5,
b_rel=4,
node_id=9,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| trans_arc: TransArc(pair_key=(3, 5),
a_rel=5,
b_rel=3,
node_id=9,
a_dir=<RelDir.TAIL: 1>,
b_dir=<
(3, 5) Nationality.tail Brad_Pitt LivedIn.tail
RelDir.TAIL: 1>)
ic| trans_arc: TransArc(pair_key=(3, 4),
a_rel=4,
b_rel=3,
node_id=9,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| node_id: 10, len(seeds): 2
ic| trans_arc: TransArc(pair_key=(4, 5),
a_rel=5,
b_rel=4,
node_id=10,
a_dir=<RelDir.HEAD: 0>,
b_dir=<RelDir.HEAD: 0>)
ic| node_id: 11, len(seeds): 3
ic| trans_arc: TransArc(pair_key=(3,
(3, 4) BornIn.tail Brad_Pitt LivedIn.tail
(4, 5) Nationality.head USA BornIn.head
(3, 4) BornIn.tail Clint_Eastwood LivedIn.tail
4),
a_rel=4,
b_rel=3,
node_id=11,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic
(3, 4) BornIn.tail Clint_Eastwood LivedIn.tail
| trans_arc: TransArc(pair_key=(3, 4),
a_rel=4,
b_rel=3,
node_id=11,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| trans_arc: TransArc(pair_key=(3, 3),
a_rel=3,
b_rel=3,
node_id=11,
a_dir=<RelDir.TAIL: 1>,
b_dir=<RelDir.TAIL: 1>)
ic| node_id: 12, len(seeds
(3, 3) LivedIn.tail Clint_Eastwood LivedIn.tail
): 2
ic| trans_arc: TransArc(pair_key=(3, 4),
a_rel=4,
b_rel=3,
node_id=12,
a_dir=<RelDir.HEAD: 0>,
b_dir=<RelDir.HEAD: 0>)
(3, 4) BornIn.head San_Francisco LivedIn.head
```python
scores: typing.Dict[ tuple, float ] = graph.get_affinity_scores(
debug = True,
)
```
--- collect shared entity tallies ---
0 Directed
h: 4 dict_items([(2, 4.0)])
t: 6 dict_items([(0, 3.0), (1, 3.0)])
1 Profession
h: 3 dict_items([(1, 3.0)])
t: 10 dict_items([(0, 3.0), (2, 5.0), (4, 2.0)])
2 ActedIn
h: 4 dict_items([(0, 4.0)])
t: 10 dict_items([(1, 5.0), (2, 3.0), (4, 2.0)])
3 LivedIn
h: 8 dict_items([(3, 3.0), (4, 5.0)])
t: 10 dict_items([(3, 3.0), (4, 5.0), (5, 2.0)])
4 BornIn
h: 7 dict_items([(3, 5.0), (5, 2.0)])
t: 11 dict_items([(1, 2.0), (2, 2.0), (3, 5.0), (5, 2.0)])
5 Nationality
h: 2 dict_items([(4, 2.0)])
t: 4 dict_items([(3, 2.0), (4, 2.0)])
```python
ic(scores);
```
ic| scores: {(0, 0): 0.3,
(0, 1): 0.2653846153846154,
(0, 2): 0.34285714285714286,
(1, 1): 0.23076923076923078,
(1, 2): 0.3708791208791209,
(1, 4): 0.13247863247863248,
(2, 2): 0.21428571428571427,
(2, 4): 0.12698412698412698,
(3, 3): 0.3333333333333333,
(3, 4): 0.5555555555555556,
(3, 5): 0.2222222222222222,
(4, 5): 0.4444444444444444}
## visualize the transform results
```python
graph.render_gor_plt(scores)
plt.show()
```
![png](ex1_0_files/ex1_0_22_0.png)
```python
pv_graph: pyvis.network.Network = graph.render_gor_pyvis(scores)
pv_graph.force_atlas_2based(
gravity = -38,
central_gravity = 0.01,
spring_length = 231,
spring_strength = 0.7,
damping = 0.8,
overlap = 0,
)
pv_graph.show_buttons(filter_ = [ "physics" ])
pv_graph.toggle_physics(True)
pv_graph.prep_notebook()
pv_graph.show("tmp.fig03.html")
```
tmp.fig03.html
![png](ex1_0_files/tmp.fig03.png)
## analysis
As the results below above illustrate, the computed _affinity scores_ differ from what is published in `lee2023ingram`. After trying several different variations of interpretation for the paper's descriptions, the current approach provides the closest approximation that we have obtained.
```python
df: pd.DataFrame = graph.trace_metrics(scores)
df
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>pair</th>
<th>rel_a</th>
<th>rel_b</th>
<th>affinity</th>
<th>expected</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>(0, 0)</td>
<td>Directed</td>
<td>Directed</td>
<td>0.30</td>
<td>NaN</td>
</tr>
<tr>
<th>1</th>
<td>(0, 1)</td>
<td>Directed</td>
<td>Profession</td>
<td>0.27</td>
<td>0.22</td>
</tr>
<tr>
<th>2</th>
<td>(0, 2)</td>
<td>Directed</td>
<td>ActedIn</td>
<td>0.34</td>
<td>0.50</td>
</tr>
<tr>
<th>3</th>
<td>(1, 1)</td>
<td>Profession</td>
<td>Profession</td>
<td>0.23</td>
<td>NaN</td>
</tr>
<tr>
<th>4</th>
<td>(1, 2)</td>
<td>Profession</td>
<td>ActedIn</td>
<td>0.37</td>
<td>0.33</td>
</tr>
<tr>
<th>5</th>
<td>(1, 4)</td>
<td>Profession</td>
<td>BornIn</td>
<td>0.13</td>
<td>0.11</td>
</tr>
<tr>
<th>6</th>
<td>(2, 2)</td>
<td>ActedIn</td>
<td>ActedIn</td>
<td>0.21</td>
<td>NaN</td>
</tr>
<tr>
<th>7</th>
<td>(2, 4)</td>
<td>ActedIn</td>
<td>BornIn</td>
<td>0.13</td>
<td>0.11</td>
</tr>
<tr>
<th>8</th>
<td>(3, 3)</td>
<td>LivedIn</td>
<td>LivedIn</td>
<td>0.33</td>
<td>NaN</td>
</tr>
<tr>
<th>9</th>
<td>(3, 4)</td>
<td>LivedIn</td>
<td>BornIn</td>
<td>0.56</td>
<td>0.81</td>
</tr>
<tr>
<th>10</th>
<td>(3, 5)</td>
<td>LivedIn</td>
<td>Nationality</td>
<td>0.22</td>
<td>0.11</td>
</tr>
<tr>
<th>11</th>
<td>(4, 5)</td>
<td>BornIn</td>
<td>Nationality</td>
<td>0.44</td>
<td>0.36</td>
</tr>
</tbody>
</table>
</div>
## statistical stack profile instrumentation
```python
profiler.stop()
```
<pyinstrument.session.Session at 0x1416bc7f0>
```python
profiler.print()
```
_ ._ __/__ _ _ _ _ _/_ Recorded: 17:35:45 Samples: 2526
/_//_/// /_\ / //_// / //_'/ // Duration: 3.799 CPU time: 4.060
/ _/ v4.6.1
Program: /Users/paco/src/textgraphs/venv/lib/python3.10/site-packages/ipykernel_launcher.py -f /Users/paco/Library/Jupyter/runtime/kernel-27f0c564-73f8-45ab-9f64-8b064ae1de10.json
3.799 IPythonKernel.dispatch_queue ipykernel/kernelbase.py:525
└─ 3.791 IPythonKernel.process_one ipykernel/kernelbase.py:511
[10 frames hidden] ipykernel, IPython
3.680 ZMQInteractiveShell.run_ast_nodes IPython/core/interactiveshell.py:3394
├─ 2.176 <module> ../ipykernel_4421/3358887201.py:1
│ └─ 2.176 GraphOfRelations.construct_gor textgraphs/gor.py:311
│ ├─ 1.607 IceCreamDebugger.__call__ icecream/icecream.py:204
│ │ [17 frames hidden] icecream, colorama, ipykernel, thread...
│ │ 1.078 lock.acquire <built-in>
│ └─ 0.566 GraphOfRelations._transformed_triples textgraphs/gor.py:275
│ └─ 0.563 IceCreamDebugger.__call__ icecream/icecream.py:204
│ [13 frames hidden] icecream, colorama, ipykernel, zmq, t...
├─ 0.866 <module> ../ipykernel_4421/4061275008.py:1
│ └─ 0.866 GraphOfRelations.seeds textgraphs/gor.py:197
│ └─ 0.865 IceCreamDebugger.__call__ icecream/icecream.py:204
│ [42 frames hidden] icecream, inspect, posixpath, <built-...
├─ 0.362 <module> ../ipykernel_4421/559531165.py:1
│ ├─ 0.234 show matplotlib/pyplot.py:482
│ │ [32 frames hidden] matplotlib, matplotlib_inline, IPytho...
│ └─ 0.128 GraphOfRelations.render_gor_plt textgraphs/gor.py:522
│ └─ 0.104 draw_networkx networkx/drawing/nx_pylab.py:127
│ [6 frames hidden] networkx, matplotlib
├─ 0.197 <module> ../ipykernel_4421/1169542473.py:1
│ └─ 0.197 IceCreamDebugger.__call__ icecream/icecream.py:204
│ [14 frames hidden] icecream, colorama, ipykernel, thread...
└─ 0.041 <module> ../ipykernel_4421/2247466716.py:1
## outro
_\[ more parts are in progress, getting added to this demo \]_