!!! note
    To run this notebook in JupyterLab, load [`examples/ex1_0.ipynb`](https://github.com/DerwenAI/textgraphs/blob/main/examples/ex1_0.ipynb)

    
# reproduce results from the "InGram" paper

This is an attempt to reproduce the _graph of relations_ example given in `lee2023ingram`

## environment


```python
import os
import pathlib
import typing

from icecream import ic
from pyinstrument import Profiler
import matplotlib.pyplot as plt
import pandas as pd
import pyvis

import textgraphs
```


```python
%load_ext watermark
```


```python
%watermark
```

    Last updated: 2024-01-16T17:35:45.550539-08:00
    
    Python implementation: CPython
    Python version       : 3.10.11
    IPython version      : 8.20.0
    
    Compiler    : Clang 13.0.0 (clang-1300.0.29.30)
    OS          : Darwin
    Release     : 21.6.0
    Machine     : x86_64
    Processor   : i386
    CPU cores   : 8
    Architecture: 64bit
    

```python
%watermark --iversions
```

    matplotlib: 3.8.2
    pandas    : 2.1.4
    pyvis     : 0.3.2
    textgraphs: 0.5.0
    sys       : 3.10.11 (v3.10.11:7d4cc5aa85, Apr  4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)]
    

## load example graph

load from a JSON file which replicates the data for the "Figure 3" example


```python
graph: textgraphs.GraphOfRelations = textgraphs.GraphOfRelations(
    textgraphs.SimpleGraph()
)

ingram_path: pathlib.Path = pathlib.Path(os.getcwd()) / "ingram.json"

graph.load_ingram(
    ingram_path,
    debug = False,
)
```

set up the statistical stack profiling


```python
profiler: Profiler = Profiler()
profiler.start()
```

## decouple graph edges into "seeds"


```python
graph.seeds(
    debug = True,
)
```

    
    --- triples in source graph ---


    ic| edge.src_node: 0, rel_id: 1, edge.dst_node: 1
    ic| edge.src_node: 0, rel_id: 0, edge.dst_node: 2
    ic| edge.src_node: 0, rel_id: 0, edge.dst_node: 3
    ic| edge.src_node: 4, rel_id: 2, edge.dst_node: 2
    ic| edge.src_node: 4, rel_id: 2, edge.dst_node: 3
    ic| edge.src_node: 4, rel_id: 1, edge.dst_node: 5
    ic| edge.src_node: 6, rel_id: 1, edge.dst_node: 5
    ic| edge.src_node: 6, rel_id: 2, edge.dst_node: 7
    ic| edge.src_node: 6, rel_id: 4, edge.dst_node: 8
    ic| edge.src_node: 9, 

     Steven_Spielberg Profession Director
     Steven_Spielberg Directed Catch_Me_If_Can
     Steven_Spielberg Directed Saving_Private_Ryan
     Tom_Hanks ActedIn Catch_Me_If_Can
     Tom_Hanks ActedIn Saving_Private_Ryan
     Tom_Hanks Profession Actor
     Mark_Hamil Profession Actor
     Mark_Hamil ActedIn Star_Wars
     Mark_Hamil BornIn California


    rel_id: 5, edge.dst_node: 10
    ic| edge.src_node: 9, rel_id: 4, edge.dst_node: 10
    ic| edge.src_node: 9, rel_id: 3, edge.dst_node: 8
    ic| edge.src_node: 11, rel_id: 4, edge.dst_node: 12
    ic| edge.src_node: 11, rel_id: 3, edge.dst_node: 12
    ic| edge.src_node: 11, rel_id: 3, edge.dst_node: 8


     Brad_Pitt Nationality USA
     Brad_Pitt BornIn USA
     Brad_Pitt LivedIn California
     Clint_Eastwood BornIn San_Francisco
     Clint_Eastwood LivedIn San_Francisco
     Clint_Eastwood LivedIn California


```python
graph.trace_source_graph()
```

    
    --- nodes in source graph ---
    n:  0, Steven_Spielberg
     head: []
     tail: [(0, 'Profession', 1), (0, 'Directed', 2), (0, 'Directed', 3)]
    n:  1, Director
     head: [(0, 'Profession', 1)]
     tail: []
    n:  2, Catch_Me_If_Can
     head: [(0, 'Directed', 2), (4, 'ActedIn', 2)]
     tail: []
    n:  3, Saving_Private_Ryan
     head: [(0, 'Directed', 3), (4, 'ActedIn', 3)]
     tail: []
    n:  4, Tom_Hanks
     head: []
     tail: [(4, 'ActedIn', 2), (4, 'ActedIn', 3), (4, 'Profession', 5)]
    n:  5, Actor
     head: [(4, 'Profession', 5), (6, 'Profession', 5)]
     tail: []
    n:  6, Mark_Hamil
     head: []
     tail: [(6, 'Profession', 5), (6, 'ActedIn', 7), (6, 'BornIn', 8)]
    n:  7, Star_Wars
     head: [(6, 'ActedIn', 7)]
     tail: []
    n:  8, California
     head: [(6, 'BornIn', 8), (9, 'LivedIn', 8), (11, 'LivedIn', 8)]
     tail: []
    n:  9, Brad_Pitt
     head: []
     tail: [(9, 'Nationality', 10), (9, 'BornIn', 10), (9, 'LivedIn', 8)]
    n: 10, USA
     head: [(9, 'Nationality', 10), (9, 'BornIn', 10)]
     tail: []
    n: 11, Clint_Eastwood
     head: []
     tail: [(11, 'BornIn', 12), (11, 'LivedIn', 12), (11, 'LivedIn', 8)]
    n: 12, San_Francisco
     head: [(11, 'BornIn', 12), (11, 'LivedIn', 12)]
     tail: []
    
    --- edges in source graph ---
    e:  0, Directed
    e:  1, Profession
    e:  2, ActedIn
    e:  3, LivedIn
    e:  4, BornIn
    e:  5, Nationality


## construct a _graph of relations_

Transform the graph data into _graph of relations_


```python
graph.construct_gor(
	debug = True,
)
```

    ic| node_id: 0, len(seeds

    
    --- transformed triples ---


    ): 3
    ic| trans_arc: TransArc(pair_key=(0, 1),
                            a_rel=1,
                            b_rel=0,
                            node_id=0,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| trans_arc: TransArc(pair_key=(0, 1),
                            a_rel=1,
                            b_rel=0,
                            node_id=0,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| trans_arc: TransArc(pair_key=(0, 0),
                            a_rel=0,
                            b_rel=0,
                            node_id=0,
                            a_dir=<RelDir

     (0, 1) Profession.tail Steven_Spielberg Directed.tail
    
     (0, 1) Profession.tail Steven_Spielberg Directed.tail
    
     (0, 0) Directed.tail Steven_Spielberg Directed.tail


    .TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| node_id: 1, len(seeds

    
    ): 1
    ic| node_id: 2, len(seeds): 2
    ic| trans_arc: TransArc(pair_key=(0, 2),
                            a_rel=0,
                            b_rel=2,
                            node_id=2,
                            a_dir=<RelDir.HEAD: 0>,
                            b_dir=<

     (0, 2) Directed.head Catch_Me_If_Can ActedIn.head


    RelDir.HEAD: 0>)
    ic| node_id: 3, len(seeds): 2
    ic| trans_arc: TransArc(pair_key=(0, 2),
                            a_rel=0,
                            b_rel=2,
                            node_id=3,
                            a_dir=<RelDir.HEAD: 0>,
                            b_dir=<RelDir.HEAD: 0>)
    ic| node_id

    
     (0, 2) Directed.head Saving_Private_Ryan ActedIn.head
    

    : 4, len(seeds): 3
    ic| trans_arc: TransArc(pair_key=(2, 2),
                            a_rel=2,
                            b_rel=2,
                            node_id=4,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| trans_arc: TransArc(pair_key=(1, 2),
                            a_rel=2,
                            b_rel=1,
                            node_id=4,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| trans_arc: TransArc(pair_key=(1, 2)

     (2, 2) ActedIn.tail Tom_Hanks ActedIn.tail
    
     (1, 2) ActedIn.tail Tom_Hanks Profession.tail
    
     (1, 2) ActedIn.tail Tom_Hanks Profession.tail


    ,
                            a_rel=2,
                            b_rel=1,
                            node_id=4,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic|

    
     node_id: 5, len(seeds): 2
    ic| trans_arc: TransArc(pair_key=(1, 1),
                            a_rel=1,
                            b_rel=1,
                            

     (1, 1) Profession.head Actor Profession.head


    node_id=5,
                            a_dir=<RelDir.HEAD: 0>,
                            b_dir=<RelDir.HEAD: 0>)
    ic| node_id: 6, len(seeds): 3
    ic| trans_arc: TransArc(pair_key=(1, 2),
                            a_rel=1,
                            b_rel=2,
                            node_id=6,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 

    
     (1, 2) Profession.tail Mark_Hamil ActedIn.tail


    1>)
    ic| trans_arc: TransArc(pair_key=(1, 4),
                            a_rel=1,
                            b_rel=4,
                            node_id=6,
                            a_dir

    
     (1, 4) Profession.tail Mark_Hamil BornIn.tail


    =<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| trans_arc: TransArc(pair_key=(2, 4),
                            a_rel=2,
                            b_rel=4,
                            node_id=6,


     (2, 4) ActedIn.tail Mark_Hamil BornIn.tail


                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| node_id: 7, len(seeds): 1
    ic| node_id: 8, len(seeds): 3
    ic| trans_arc: TransArc(pair_key=(3, 4),
                            a_rel=4,
                            b_rel=3,
                            node_id=8,
                            a_dir=<RelDir.HEAD: 0>,
                            b_dir=<RelDir.HEAD:

    
     (3, 4) BornIn.head California LivedIn.head


     0>)
    ic| trans_arc: TransArc(pair_key=(3, 4),
                            a_rel=4,
                            b_rel=3,
                            node_id=8,
                            a_dir=<RelDir.HEAD: 0>,
                            b_dir=<RelDir.HEAD: 0>)
    ic| trans_arc: TransArc(pair_key=(3, 3),
                            a_rel=3,
                            b_rel=3,
                            node_id=8,
                            a_dir=<RelDir.HEAD: 0>,
                            b_dir=<RelDir.HEAD: 0>)
    ic| node_id: 9, len(seeds): 3
    ic

    
     (3, 4) BornIn.head California LivedIn.head
    
     (3, 3) LivedIn.head California LivedIn.head
    
     (4, 5) Nationality.tail Brad_Pitt BornIn.tail


    | trans_arc: TransArc(pair_key=(4, 5),
                            a_rel=5,
                            b_rel=4,
                            node_id=9,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| trans_arc: TransArc(pair_key=(3, 5),
                            a_rel=5,
                            b_rel=3,
                            node_id=9,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<

    
     (3, 5) Nationality.tail Brad_Pitt LivedIn.tail


    RelDir.TAIL: 1>)
    ic| trans_arc: TransArc(pair_key=(3, 4),
                            a_rel=4,
                            b_rel=3,
                            node_id=9,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| node_id: 10, len(seeds): 2
    ic| trans_arc: TransArc(pair_key=(4, 5),
                            a_rel=5,
                            b_rel=4,
                            node_id=10,
                            a_dir=<RelDir.HEAD: 0>,
                            b_dir=<RelDir.HEAD: 0>)
    ic| node_id: 11, len(seeds): 3
    ic| trans_arc: TransArc(pair_key=(3, 

    
     (3, 4) BornIn.tail Brad_Pitt LivedIn.tail
    
     (4, 5) Nationality.head USA BornIn.head
    
     (3, 4) BornIn.tail Clint_Eastwood LivedIn.tail


    4),
                            a_rel=4,
                            b_rel=3,
                            node_id=11,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic

    
     (3, 4) BornIn.tail Clint_Eastwood LivedIn.tail


    | trans_arc: TransArc(pair_key=(3, 4),
                            a_rel=4,
                            b_rel=3,
                            node_id=11,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| trans_arc: TransArc(pair_key=(3, 3),
                            a_rel=3,
                            b_rel=3,
                            node_id=11,
                            a_dir=<RelDir.TAIL: 1>,
                            b_dir=<RelDir.TAIL: 1>)
    ic| node_id: 12, len(seeds

    
     (3, 3) LivedIn.tail Clint_Eastwood LivedIn.tail
    

    ): 2
    ic| trans_arc: TransArc(pair_key=(3, 4),
                            a_rel=4,
                            b_rel=3,
                            node_id=12,
                            a_dir=<RelDir.HEAD: 0>,
                            b_dir=<RelDir.HEAD: 0>)


     (3, 4) BornIn.head San_Francisco LivedIn.head
    

```python
scores: typing.Dict[ tuple, float ] = graph.get_affinity_scores(
    debug = True,
)
```

    
    --- collect shared entity tallies ---
    0 Directed
     h: 4 dict_items([(2, 4.0)])
     t: 6 dict_items([(0, 3.0), (1, 3.0)])
    1 Profession
     h: 3 dict_items([(1, 3.0)])
     t: 10 dict_items([(0, 3.0), (2, 5.0), (4, 2.0)])
    2 ActedIn
     h: 4 dict_items([(0, 4.0)])
     t: 10 dict_items([(1, 5.0), (2, 3.0), (4, 2.0)])
    3 LivedIn
     h: 8 dict_items([(3, 3.0), (4, 5.0)])
     t: 10 dict_items([(3, 3.0), (4, 5.0), (5, 2.0)])
    4 BornIn
     h: 7 dict_items([(3, 5.0), (5, 2.0)])
     t: 11 dict_items([(1, 2.0), (2, 2.0), (3, 5.0), (5, 2.0)])
    5 Nationality
     h: 2 dict_items([(4, 2.0)])
     t: 4 dict_items([(3, 2.0), (4, 2.0)])


```python
ic(scores);
```

    ic| scores: {(0, 0): 0.3,
                 (0, 1): 0.2653846153846154,
                 (0, 2): 0.34285714285714286,
                 (1, 1): 0.23076923076923078,
                 (1, 2): 0.3708791208791209,
                 (1, 4): 0.13247863247863248,
                 (2, 2): 0.21428571428571427,
                 (2, 4): 0.12698412698412698,
                 (3, 3): 0.3333333333333333,
                 (3, 4): 0.5555555555555556,
                 (3, 5): 0.2222222222222222,
                 (4, 5): 0.4444444444444444}


## visualize the transform results


```python
graph.render_gor_plt(scores)
plt.show()
```


![png](ex1_0_files/ex1_0_22_0.png)
    

```python
pv_graph: pyvis.network.Network = graph.render_gor_pyvis(scores)

pv_graph.force_atlas_2based(
    gravity = -38,
    central_gravity = 0.01,
    spring_length = 231,
    spring_strength = 0.7,
    damping = 0.8,
    overlap = 0,
)

pv_graph.show_buttons(filter_ = [ "physics" ])
pv_graph.toggle_physics(True)

pv_graph.prep_notebook()
pv_graph.show("tmp.fig03.html")
```

    tmp.fig03.html


![png](ex1_0_files/tmp.fig03.png)


## analysis

As the results below above illustrate, the computed _affinity scores_ differ from what is published in `lee2023ingram`. After trying several different variations of interpretation for the paper's descriptions, the current approach provides the closest approximation that we have obtained.


```python
df: pd.DataFrame = graph.trace_metrics(scores)
df
```


<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>pair</th>
      <th>rel_a</th>
      <th>rel_b</th>
      <th>affinity</th>
      <th>expected</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>(0, 0)</td>
      <td>Directed</td>
      <td>Directed</td>
      <td>0.30</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1</th>
      <td>(0, 1)</td>
      <td>Directed</td>
      <td>Profession</td>
      <td>0.27</td>
      <td>0.22</td>
    </tr>
    <tr>
      <th>2</th>
      <td>(0, 2)</td>
      <td>Directed</td>
      <td>ActedIn</td>
      <td>0.34</td>
      <td>0.50</td>
    </tr>
    <tr>
      <th>3</th>
      <td>(1, 1)</td>
      <td>Profession</td>
      <td>Profession</td>
      <td>0.23</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>4</th>
      <td>(1, 2)</td>
      <td>Profession</td>
      <td>ActedIn</td>
      <td>0.37</td>
      <td>0.33</td>
    </tr>
    <tr>
      <th>5</th>
      <td>(1, 4)</td>
      <td>Profession</td>
      <td>BornIn</td>
      <td>0.13</td>
      <td>0.11</td>
    </tr>
    <tr>
      <th>6</th>
      <td>(2, 2)</td>
      <td>ActedIn</td>
      <td>ActedIn</td>
      <td>0.21</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>7</th>
      <td>(2, 4)</td>
      <td>ActedIn</td>
      <td>BornIn</td>
      <td>0.13</td>
      <td>0.11</td>
    </tr>
    <tr>
      <th>8</th>
      <td>(3, 3)</td>
      <td>LivedIn</td>
      <td>LivedIn</td>
      <td>0.33</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>9</th>
      <td>(3, 4)</td>
      <td>LivedIn</td>
      <td>BornIn</td>
      <td>0.56</td>
      <td>0.81</td>
    </tr>
    <tr>
      <th>10</th>
      <td>(3, 5)</td>
      <td>LivedIn</td>
      <td>Nationality</td>
      <td>0.22</td>
      <td>0.11</td>
    </tr>
    <tr>
      <th>11</th>
      <td>(4, 5)</td>
      <td>BornIn</td>
      <td>Nationality</td>
      <td>0.44</td>
      <td>0.36</td>
    </tr>
  </tbody>
</table>
</div>


## statistical stack profile instrumentation


```python
profiler.stop()
```


    <pyinstrument.session.Session at 0x1416bc7f0>


```python
profiler.print()
```

    
      _     ._   __/__   _ _  _  _ _/_   Recorded: 17:35:45  Samples:  2526
     /_//_/// /_\ / //_// / //_'/ //     Duration: 3.799     CPU time: 4.060
    /   _/                      v4.6.1
    
    Program: /Users/paco/src/textgraphs/venv/lib/python3.10/site-packages/ipykernel_launcher.py -f /Users/paco/Library/Jupyter/runtime/kernel-27f0c564-73f8-45ab-9f64-8b064ae1de10.json
    
    3.799 IPythonKernel.dispatch_queue  ipykernel/kernelbase.py:525
    └─ 3.791 IPythonKernel.process_one  ipykernel/kernelbase.py:511
          [10 frames hidden]  ipykernel, IPython
             3.680 ZMQInteractiveShell.run_ast_nodes  IPython/core/interactiveshell.py:3394
             ├─ 2.176 <module>  ../ipykernel_4421/3358887201.py:1
             │  └─ 2.176 GraphOfRelations.construct_gor  textgraphs/gor.py:311
             │     ├─ 1.607 IceCreamDebugger.__call__  icecream/icecream.py:204
             │     │     [17 frames hidden]  icecream, colorama, ipykernel, thread...
             │     │        1.078 lock.acquire  <built-in>
             │     └─ 0.566 GraphOfRelations._transformed_triples  textgraphs/gor.py:275
             │        └─ 0.563 IceCreamDebugger.__call__  icecream/icecream.py:204
             │              [13 frames hidden]  icecream, colorama, ipykernel, zmq, t...
             ├─ 0.866 <module>  ../ipykernel_4421/4061275008.py:1
             │  └─ 0.866 GraphOfRelations.seeds  textgraphs/gor.py:197
             │     └─ 0.865 IceCreamDebugger.__call__  icecream/icecream.py:204
             │           [42 frames hidden]  icecream, inspect, posixpath, <built-...
             ├─ 0.362 <module>  ../ipykernel_4421/559531165.py:1
             │  ├─ 0.234 show  matplotlib/pyplot.py:482
             │  │     [32 frames hidden]  matplotlib, matplotlib_inline, IPytho...
             │  └─ 0.128 GraphOfRelations.render_gor_plt  textgraphs/gor.py:522
             │     └─ 0.104 draw_networkx  networkx/drawing/nx_pylab.py:127
             │           [6 frames hidden]  networkx, matplotlib
             ├─ 0.197 <module>  ../ipykernel_4421/1169542473.py:1
             │  └─ 0.197 IceCreamDebugger.__call__  icecream/icecream.py:204
             │        [14 frames hidden]  icecream, colorama, ipykernel, thread...
             └─ 0.041 <module>  ../ipykernel_4421/2247466716.py:1
    
    
## outro

_\[ more parts are in progress, getting added to this demo \]_