!!! note To run this notebook in JupyterLab, load [`examples/ex2_0.ipynb`](https://github.com/DerwenAI/textgraphs/blob/main/examples/ex2_0.ipynb) # bootstrap the _lemma graph_ with RDF triples Show how to bootstrap definitions in a _lemma graph_ by loading RDF, e.g., for synonyms. ## environment ```python from icecream import ic from pyinstrument import Profiler import pyvis import textgraphs ``` ```python %load_ext watermark ``` ```python %watermark ``` Last updated: 2024-01-16T17:35:59.608787-08:00 Python implementation: CPython Python version : 3.10.11 IPython version : 8.20.0 Compiler : Clang 13.0.0 (clang-1300.0.29.30) OS : Darwin Release : 21.6.0 Machine : x86_64 Processor : i386 CPU cores : 8 Architecture: 64bit ```python %watermark --iversions ``` pyvis : 0.3.2 textgraphs: 0.5.0 sys : 3.10.11 (v3.10.11:7d4cc5aa85, Apr 4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)] ## load the bootstrap definitions Define the bootstrap RDF triples in N3/Turtle format: we define an entity `Werner` as a synonym for `Werner Herzog` by using the [`skos:broader`](https://www.w3.org/TR/skos-reference/#semantic-relations) relation. Keep in mind that this entity may also refer to other Werners... ```python TTL_STR: str = """ @base . @prefix dbo: . @prefix skos: . a dbo:Person ; skos:prefLabel "Werner"@en . a dbo:Person ; skos:prefLabel "Werner Herzog"@en. dbo:Person skos:definition "People, including fictional"@en ; skos:prefLabel "person"@en . skos:broader . """ ``` Provide the source text ```python SRC_TEXT: str = """ Werner Herzog is a remarkable filmmaker and an intellectual originally from Germany, the son of Dietrich Herzog. After the war, Werner fled to America to become famous. """ ``` set up the statistical stack profiling ```python profiler: Profiler = Profiler() profiler.start() ``` set up the `TextGraphs` pipeline ```python tg: textgraphs.TextGraphs = textgraphs.TextGraphs( factory = textgraphs.PipelineFactory( kg = textgraphs.KGWikiMedia( spotlight_api = textgraphs.DBPEDIA_SPOTLIGHT_API, dbpedia_search_api = textgraphs.DBPEDIA_SEARCH_API, dbpedia_sparql_api = textgraphs.DBPEDIA_SPARQL_API, wikidata_api = textgraphs.WIKIDATA_API, min_alias = textgraphs.DBPEDIA_MIN_ALIAS, min_similarity = textgraphs.DBPEDIA_MIN_SIM, ), ), ) ``` load the bootstrap definitions ```python tg.load_bootstrap_ttl( TTL_STR, debug = False, ) ``` parse the input text ```python pipe: textgraphs.Pipeline = tg.create_pipeline( SRC_TEXT.strip(), ) tg.collect_graph_elements( pipe, debug = False, ) tg.construct_lemma_graph( debug = False, ) ``` ## visualize the lemma graph ```python render: textgraphs.RenderPyVis = tg.create_render() pv_graph: pyvis.network.Network = render.render_lemma_graph( debug = False, ) ``` initialize the layout parameters ```python pv_graph.force_atlas_2based( gravity = -38, central_gravity = 0.01, spring_length = 231, spring_strength = 0.7, damping = 0.8, overlap = 0, ) pv_graph.show_buttons(filter_ = [ "physics" ]) pv_graph.toggle_physics(True) ``` ```python pv_graph.prep_notebook() pv_graph.show("tmp.fig04.html") ``` tmp.fig04.html ![png](ex2_0_files/tmp.fig04.png) Notice how the `Werner` and `Werner Herzog` nodes are now linked? This synonym from the bootstrap definitions above provided means to link more portions of the _lemma graph_ than the demo in `ex0_0` with the same input text. ## statistical stack profile instrumentation ```python profiler.stop() ``` ```python profiler.print() ``` _ ._ __/__ _ _ _ _ _/_ Recorded: 17:35:59 Samples: 2846 /_//_/// /_\ / //_// / //_'/ // Duration: 4.111 CPU time: 3.294 / _/ v4.6.1 Program: /Users/paco/src/textgraphs/venv/lib/python3.10/site-packages/ipykernel_launcher.py -f /Users/paco/Library/Jupyter/runtime/kernel-4365d4ba-2d4d-4d4b-83e2-eb5ef8abfe26.json 4.111 IPythonKernel.dispatch_shell ipykernel/kernelbase.py:378 └─ 4.075 IPythonKernel.execute_request ipykernel/kernelbase.py:721 [9 frames hidden] ipykernel, IPython 3.995 ZMQInteractiveShell.run_ast_nodes IPython/core/interactiveshell.py:3394 ├─ 3.250 ../ipykernel_4433/1372904243.py:1 │ └─ 3.248 PipelineFactory.__init__ textgraphs/pipe.py:434 │ └─ 3.232 load spacy/__init__.py:27 │ [98 frames hidden] spacy, en_core_web_sm, catalogue, imp... │ 0.496 tokenizer_factory spacy/language.py:110 │ └─ 0.108 _validate_special_case spacy/tokenizer.pyx:573 │ 0.439 spacy/language.py:2170 │ └─ 0.085 _validate_special_case spacy/tokenizer.pyx:573 ├─ 0.672 ../ipykernel_4433/3257668275.py:1 │ └─ 0.669 TextGraphs.create_pipeline textgraphs/doc.py:103 │ └─ 0.669 PipelineFactory.create_pipeline textgraphs/pipe.py:508 │ └─ 0.669 Pipeline.__init__ textgraphs/pipe.py:216 │ └─ 0.669 English.__call__ spacy/language.py:1016 │ [31 frames hidden] spacy, spacy_dbpedia_spotlight, reque... └─ 0.055 ../ipykernel_4433/72966960.py:1 └─ 0.046 Network.prep_notebook pyvis/network.py:552 [5 frames hidden] pyvis, jinja2 ## outro _\[ more parts are in progress, getting added to this demo \]_