Demosthene-OR commited on
Commit
3f5d323
·
1 Parent(s): 4eb9b1a
Files changed (9) hide show
  1. Dockerfile +1 -1
  2. LICENSE +21 -0
  3. README.md +90 -20
  4. app.py +66 -0
  5. generate_knowledge_graph.py +127 -0
  6. knowledge_graph.html +492 -0
  7. knowledge_graph.ipynb +313 -0
  8. requirements.txt +13 -3
  9. src/streamlit_app.py +0 -40
Dockerfile CHANGED
@@ -17,4 +17,4 @@ EXPOSE 8501
17
 
18
  HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
19
 
20
- ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
 
17
 
18
  HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
19
 
20
+ ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Thu
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,20 +1,90 @@
1
- ---
2
- title: Knowledge Graph Generator
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: Extract graph data from text input and generates interactive
12
- license: mit
13
- ---
14
-
15
- # Welcome to Streamlit!
16
-
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
-
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Knowledge Graph Generator
2
+
3
+ A Streamlit application that extract graph data (entities and relationships) from text input using LangChain and OpenAI's GPT models, and generates interactive graphs.
4
+ ![CleanShot 2025-05-28 at 13 11 46](https://github.com/user-attachments/assets/4fef9158-8dd8-432d-bb8a-b53953a82c6c)
5
+
6
+ 👉 This repo is part of my project tutorial on Youtube:
7
+ [![](https://img.youtube.com/vi/O-T_6KOXML4/0.jpg)](https://www.youtube.com/watch?v=O-T_6KOXML4)
8
+
9
+ ## Features
10
+
11
+ - Two input methods: text upload (.txt files) or direct text input
12
+ - Interactive knowledge graph visualization
13
+ - Customizable graph display with physics-based layout
14
+ - Entity relationship extraction powered by OpenAI's GPT-4o model
15
+
16
+ ## Installation
17
+
18
+ ### Prerequisites
19
+
20
+ - Python 3.8 or higher
21
+ - OpenAI API key
22
+
23
+ ### Dependencies
24
+
25
+ The application requires the following Python packages:
26
+
27
+ - langchain (>= 0.1.0): Core LLM framework
28
+ - langchain-experimental (>= 0.0.45): Experimental LangChain features
29
+ - langchain-openai (>= 0.1.0): OpenAI integration for LangChain
30
+ - python-dotenv (>= 1.0.0): Environment variable support
31
+ - pyvis (>= 0.3.2): Graph visualization
32
+ - streamlit (>= 1.32.0): Web UI framework
33
+
34
+ Install all required dependencies using the provided requirements.txt file:
35
+
36
+ ```bash
37
+ pip install -r requirements.txt
38
+ ```
39
+
40
+ ### Setup
41
+
42
+ 1. Clone this repository:
43
+ ```bash
44
+ git clone [repository-url]
45
+ cd knowledge_graph_app_2
46
+ ```
47
+
48
+ Note: Replace `[repository-url]` with the actual URL of this repository.
49
+
50
+ 2. Create a `.env` file in the root directory with your OpenAI API key:
51
+ ```
52
+ OPENAI_API_KEY=your_openai_api_key_here
53
+ ```
54
+
55
+ ## Running the Application
56
+
57
+ To run the Streamlit app:
58
+
59
+ ```bash
60
+ streamlit run app.py
61
+ ```
62
+
63
+ This will start the application and open it in your default web browser (typically at http://localhost:8501).
64
+
65
+ ## Usage
66
+
67
+ 1. Choose your input method from the sidebar (Upload txt or Input text)
68
+ 2. If uploading a file, select a .txt file from your computer
69
+ 3. If using direct input, type or paste your text into the text area
70
+ 4. Click the "Generate Knowledge Graph" button
71
+ 5. Wait for the graph to be generated (this may take a few moments depending on the length of the text)
72
+ 6. Explore the interactive knowledge graph:
73
+ - Drag nodes to rearrange the graph
74
+ - Hover over nodes and edges to see additional information
75
+ - Zoom in/out using the mouse wheel
76
+ - Filter the graph for specific nodes and edges.
77
+
78
+ ## How It Works
79
+
80
+ The application uses LangChain's experimental graph transformers with OpenAI's GPT-4o model to:
81
+ 1. Extract entities from the input text
82
+ 2. Identify relationships between these entities
83
+ 3. Generate a graph structure representing this information
84
+ 4. Visualize the graph using PyVis, a Python interface for the vis.js visualization library
85
+
86
+ ## License
87
+
88
+ This project is licensed under the MIT License - a permissive open source license that allows for free use, modification, and distribution of the software.
89
+
90
+ For more details, see the [MIT License](https://opensource.org/licenses/MIT) documentation.
app.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Import necessary modules
2
+ import streamlit as st
3
+ import streamlit.components.v1 as components # For embedding custom HTML
4
+ from generate_knowledge_graph import generate_knowledge_graph
5
+
6
+ # Set up Streamlit page configuration
7
+ st.set_page_config(
8
+ page_icon=None,
9
+ layout="wide", # Use wide layout for better graph display
10
+ initial_sidebar_state="auto",
11
+ menu_items=None
12
+ )
13
+
14
+ # Set the title of the app
15
+ st.title("Knowledge Graph From Text")
16
+
17
+ # Sidebar section for user input method
18
+ st.sidebar.title("Input document")
19
+ input_method = st.sidebar.radio(
20
+ "Choose an input method:",
21
+ ["Upload txt", "Input text"], # Options for uploading a file or manually inputting text
22
+ )
23
+
24
+ # Case 1: User chooses to upload a .txt file
25
+ if input_method == "Upload txt":
26
+ # File uploader widget in the sidebar
27
+ uploaded_file = st.sidebar.file_uploader(label="Upload file", type=["txt"])
28
+
29
+ if uploaded_file is not None:
30
+ # Read the uploaded file content and decode it as UTF-8 text
31
+ text = uploaded_file.read().decode("utf-8")
32
+
33
+ # Button to generate the knowledge graph
34
+ if st.sidebar.button("Generate Knowledge Graph"):
35
+ with st.spinner("Generating knowledge graph..."):
36
+ # Call the function to generate the graph from the text
37
+ net = generate_knowledge_graph(text)
38
+ st.success("Knowledge graph generated successfully!")
39
+
40
+ # Save the graph to an HTML file
41
+ output_file = "knowledge_graph.html"
42
+ net.save_graph(output_file)
43
+
44
+ # Open the HTML file and display it within the Streamlit app
45
+ HtmlFile = open(output_file, 'r', encoding='utf-8')
46
+ components.html(HtmlFile.read(), height=1000)
47
+
48
+ # Case 2: User chooses to directly input text
49
+ else:
50
+ # Text area for manual input
51
+ text = st.sidebar.text_area("Input text", height=300)
52
+
53
+ if text: # Check if the text area is not empty
54
+ if st.sidebar.button("Generate Knowledge Graph"):
55
+ with st.spinner("Generating knowledge graph..."):
56
+ # Call the function to generate the graph from the input text
57
+ net = generate_knowledge_graph(text)
58
+ st.success("Knowledge graph generated successfully!")
59
+
60
+ # Save the graph to an HTML file
61
+ output_file = "knowledge_graph.html"
62
+ net.save_graph(output_file)
63
+
64
+ # Open the HTML file and display it within the Streamlit app
65
+ HtmlFile = open(output_file, 'r', encoding='utf-8')
66
+ components.html(HtmlFile.read(), height=1000)
generate_knowledge_graph.py ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from langchain_experimental.graph_transformers import LLMGraphTransformer
2
+ from langchain_core.documents import Document
3
+ from langchain_openai import ChatOpenAI
4
+ from pyvis.network import Network
5
+
6
+ from dotenv import load_dotenv
7
+ import os
8
+ import asyncio
9
+
10
+
11
+ # Load the .env file
12
+ load_dotenv()
13
+ # Get API key from environment variable
14
+ api_key = os.getenv("OPENAI_API_KEY")
15
+
16
+ llm = ChatOpenAI(temperature=0, model_name="gpt-4o")
17
+
18
+ graph_transformer = LLMGraphTransformer(llm=llm)
19
+
20
+
21
+ # Extract graph data from input text
22
+ async def extract_graph_data(text):
23
+ """
24
+ Asynchronously extracts graph data from input text using a graph transformer.
25
+
26
+ Args:
27
+ text (str): Input text to be processed into graph format.
28
+
29
+ Returns:
30
+ list: A list of GraphDocument objects containing nodes and relationships.
31
+ """
32
+ documents = [Document(page_content=text)]
33
+ graph_documents = await graph_transformer.aconvert_to_graph_documents(documents)
34
+ return graph_documents
35
+
36
+
37
+ def visualize_graph(graph_documents):
38
+ """
39
+ Visualizes a knowledge graph using PyVis based on the extracted graph documents.
40
+
41
+ Args:
42
+ graph_documents (list): A list of GraphDocument objects with nodes and relationships.
43
+
44
+ Returns:
45
+ pyvis.network.Network: The visualized network graph object.
46
+ """
47
+ # Create network
48
+ net = Network(height="1200px", width="100%", directed=True,
49
+ notebook=False, bgcolor="#222222", font_color="white", filter_menu=True, cdn_resources='remote')
50
+
51
+ nodes = graph_documents[0].nodes
52
+ relationships = graph_documents[0].relationships
53
+
54
+ # Build lookup for valid nodes
55
+ node_dict = {node.id: node for node in nodes}
56
+
57
+ # Filter out invalid edges and collect valid node IDs
58
+ valid_edges = []
59
+ valid_node_ids = set()
60
+ for rel in relationships:
61
+ if rel.source.id in node_dict and rel.target.id in node_dict:
62
+ valid_edges.append(rel)
63
+ valid_node_ids.update([rel.source.id, rel.target.id])
64
+
65
+ # Track which nodes are part of any relationship
66
+ connected_node_ids = set()
67
+ for rel in relationships:
68
+ connected_node_ids.add(rel.source.id)
69
+ connected_node_ids.add(rel.target.id)
70
+
71
+ # Add valid nodes to the graph
72
+ for node_id in valid_node_ids:
73
+ node = node_dict[node_id]
74
+ try:
75
+ net.add_node(node.id, label=node.id, title=node.type, group=node.type)
76
+ except:
77
+ continue # Skip node if error occurs
78
+
79
+ # Add valid edges to the graph
80
+ for rel in valid_edges:
81
+ try:
82
+ net.add_edge(rel.source.id, rel.target.id, label=rel.type.lower())
83
+ except:
84
+ continue # Skip edge if error occurs
85
+
86
+ # Configure graph layout and physics
87
+ net.set_options("""
88
+ {
89
+ "physics": {
90
+ "forceAtlas2Based": {
91
+ "gravitationalConstant": -100,
92
+ "centralGravity": 0.01,
93
+ "springLength": 200,
94
+ "springConstant": 0.08
95
+ },
96
+ "minVelocity": 0.75,
97
+ "solver": "forceAtlas2Based"
98
+ }
99
+ }
100
+ """)
101
+
102
+ output_file = "knowledge_graph.html"
103
+ try:
104
+ net.save_graph(output_file)
105
+ print(f"Graph saved to {os.path.abspath(output_file)}")
106
+ return net
107
+ except Exception as e:
108
+ print(f"Error saving graph: {e}")
109
+ return None
110
+
111
+
112
+ def generate_knowledge_graph(text):
113
+ """
114
+ Generates and visualizes a knowledge graph from input text.
115
+
116
+ This function runs the graph extraction asynchronously and then visualizes
117
+ the resulting graph using PyVis.
118
+
119
+ Args:
120
+ text (str): Input text to convert into a knowledge graph.
121
+
122
+ Returns:
123
+ pyvis.network.Network: The visualized network graph object.
124
+ """
125
+ graph_documents = asyncio.run(extract_graph_data(text))
126
+ net = visualize_graph(graph_documents)
127
+ return net
knowledge_graph.html ADDED
@@ -0,0 +1,492 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <html>
2
+ <head>
3
+ <meta charset="utf-8">
4
+
5
+ <script>function neighbourhoodHighlight(params) {
6
+ // console.log("in nieghbourhoodhighlight");
7
+ allNodes = nodes.get({ returnType: "Object" });
8
+ // originalNodes = JSON.parse(JSON.stringify(allNodes));
9
+ // if something is selected:
10
+ if (params.nodes.length > 0) {
11
+ highlightActive = true;
12
+ var i, j;
13
+ var selectedNode = params.nodes[0];
14
+ var degrees = 2;
15
+
16
+ // mark all nodes as hard to read.
17
+ for (let nodeId in allNodes) {
18
+ // nodeColors[nodeId] = allNodes[nodeId].color;
19
+ allNodes[nodeId].color = "rgba(200,200,200,0.5)";
20
+ if (allNodes[nodeId].hiddenLabel === undefined) {
21
+ allNodes[nodeId].hiddenLabel = allNodes[nodeId].label;
22
+ allNodes[nodeId].label = undefined;
23
+ }
24
+ }
25
+ var connectedNodes = network.getConnectedNodes(selectedNode);
26
+ var allConnectedNodes = [];
27
+
28
+ // get the second degree nodes
29
+ for (i = 1; i < degrees; i++) {
30
+ for (j = 0; j < connectedNodes.length; j++) {
31
+ allConnectedNodes = allConnectedNodes.concat(
32
+ network.getConnectedNodes(connectedNodes[j])
33
+ );
34
+ }
35
+ }
36
+
37
+ // all second degree nodes get a different color and their label back
38
+ for (i = 0; i < allConnectedNodes.length; i++) {
39
+ // allNodes[allConnectedNodes[i]].color = "pink";
40
+ allNodes[allConnectedNodes[i]].color = "rgba(150,150,150,0.75)";
41
+ if (allNodes[allConnectedNodes[i]].hiddenLabel !== undefined) {
42
+ allNodes[allConnectedNodes[i]].label =
43
+ allNodes[allConnectedNodes[i]].hiddenLabel;
44
+ allNodes[allConnectedNodes[i]].hiddenLabel = undefined;
45
+ }
46
+ }
47
+
48
+ // all first degree nodes get their own color and their label back
49
+ for (i = 0; i < connectedNodes.length; i++) {
50
+ // allNodes[connectedNodes[i]].color = undefined;
51
+ allNodes[connectedNodes[i]].color = nodeColors[connectedNodes[i]];
52
+ if (allNodes[connectedNodes[i]].hiddenLabel !== undefined) {
53
+ allNodes[connectedNodes[i]].label =
54
+ allNodes[connectedNodes[i]].hiddenLabel;
55
+ allNodes[connectedNodes[i]].hiddenLabel = undefined;
56
+ }
57
+ }
58
+
59
+ // the main node gets its own color and its label back.
60
+ // allNodes[selectedNode].color = undefined;
61
+ allNodes[selectedNode].color = nodeColors[selectedNode];
62
+ if (allNodes[selectedNode].hiddenLabel !== undefined) {
63
+ allNodes[selectedNode].label = allNodes[selectedNode].hiddenLabel;
64
+ allNodes[selectedNode].hiddenLabel = undefined;
65
+ }
66
+ } else if (highlightActive === true) {
67
+ // console.log("highlightActive was true");
68
+ // reset all nodes
69
+ for (let nodeId in allNodes) {
70
+ // allNodes[nodeId].color = "purple";
71
+ allNodes[nodeId].color = nodeColors[nodeId];
72
+ // delete allNodes[nodeId].color;
73
+ if (allNodes[nodeId].hiddenLabel !== undefined) {
74
+ allNodes[nodeId].label = allNodes[nodeId].hiddenLabel;
75
+ allNodes[nodeId].hiddenLabel = undefined;
76
+ }
77
+ }
78
+ highlightActive = false;
79
+ }
80
+
81
+ // transform the object into an array
82
+ var updateArray = [];
83
+ if (params.nodes.length > 0) {
84
+ for (let nodeId in allNodes) {
85
+ if (allNodes.hasOwnProperty(nodeId)) {
86
+ // console.log(allNodes[nodeId]);
87
+ updateArray.push(allNodes[nodeId]);
88
+ }
89
+ }
90
+ nodes.update(updateArray);
91
+ } else {
92
+ // console.log("Nothing was selected");
93
+ for (let nodeId in allNodes) {
94
+ if (allNodes.hasOwnProperty(nodeId)) {
95
+ // console.log(allNodes[nodeId]);
96
+ // allNodes[nodeId].color = {};
97
+ updateArray.push(allNodes[nodeId]);
98
+ }
99
+ }
100
+ nodes.update(updateArray);
101
+ }
102
+ }
103
+
104
+ function filterHighlight(params) {
105
+ allNodes = nodes.get({ returnType: "Object" });
106
+ // if something is selected:
107
+ if (params.nodes.length > 0) {
108
+ filterActive = true;
109
+ let selectedNodes = params.nodes;
110
+
111
+ // hiding all nodes and saving the label
112
+ for (let nodeId in allNodes) {
113
+ allNodes[nodeId].hidden = true;
114
+ if (allNodes[nodeId].savedLabel === undefined) {
115
+ allNodes[nodeId].savedLabel = allNodes[nodeId].label;
116
+ allNodes[nodeId].label = undefined;
117
+ }
118
+ }
119
+
120
+ for (let i=0; i < selectedNodes.length; i++) {
121
+ allNodes[selectedNodes[i]].hidden = false;
122
+ if (allNodes[selectedNodes[i]].savedLabel !== undefined) {
123
+ allNodes[selectedNodes[i]].label = allNodes[selectedNodes[i]].savedLabel;
124
+ allNodes[selectedNodes[i]].savedLabel = undefined;
125
+ }
126
+ }
127
+
128
+ } else if (filterActive === true) {
129
+ // reset all nodes
130
+ for (let nodeId in allNodes) {
131
+ allNodes[nodeId].hidden = false;
132
+ if (allNodes[nodeId].savedLabel !== undefined) {
133
+ allNodes[nodeId].label = allNodes[nodeId].savedLabel;
134
+ allNodes[nodeId].savedLabel = undefined;
135
+ }
136
+ }
137
+ filterActive = false;
138
+ }
139
+
140
+ // transform the object into an array
141
+ var updateArray = [];
142
+ if (params.nodes.length > 0) {
143
+ for (let nodeId in allNodes) {
144
+ if (allNodes.hasOwnProperty(nodeId)) {
145
+ updateArray.push(allNodes[nodeId]);
146
+ }
147
+ }
148
+ nodes.update(updateArray);
149
+ } else {
150
+ for (let nodeId in allNodes) {
151
+ if (allNodes.hasOwnProperty(nodeId)) {
152
+ updateArray.push(allNodes[nodeId]);
153
+ }
154
+ }
155
+ nodes.update(updateArray);
156
+ }
157
+ }
158
+
159
+ function selectNode(nodes) {
160
+ network.selectNodes(nodes);
161
+ neighbourhoodHighlight({ nodes: nodes });
162
+ return nodes;
163
+ }
164
+
165
+ function selectNodes(nodes) {
166
+ network.selectNodes(nodes);
167
+ filterHighlight({nodes: nodes});
168
+ return nodes;
169
+ }
170
+
171
+ function highlightFilter(filter) {
172
+ let selectedNodes = []
173
+ let selectedProp = filter['property']
174
+ if (filter['item'] === 'node') {
175
+ let allNodes = nodes.get({ returnType: "Object" });
176
+ for (let nodeId in allNodes) {
177
+ if (allNodes[nodeId][selectedProp] && filter['value'].includes((allNodes[nodeId][selectedProp]).toString())) {
178
+ selectedNodes.push(nodeId)
179
+ }
180
+ }
181
+ }
182
+ else if (filter['item'] === 'edge'){
183
+ let allEdges = edges.get({returnType: 'object'});
184
+ // check if the selected property exists for selected edge and select the nodes connected to the edge
185
+ for (let edge in allEdges) {
186
+ if (allEdges[edge][selectedProp] && filter['value'].includes((allEdges[edge][selectedProp]).toString())) {
187
+ selectedNodes.push(allEdges[edge]['from'])
188
+ selectedNodes.push(allEdges[edge]['to'])
189
+ }
190
+ }
191
+ }
192
+ selectNodes(selectedNodes)
193
+ }</script>
194
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/vis-network/9.1.2/dist/dist/vis-network.min.css" integrity="sha512-WgxfT5LWjfszlPHXRmBWHkV2eceiWTOBvrKCNbdgDYTHrT2AeLCGbF4sZlZw3UMN3WtL0tGUoIAKsu8mllg/XA==" crossorigin="anonymous" referrerpolicy="no-referrer" />
195
+ <script src="https://cdnjs.cloudflare.com/ajax/libs/vis-network/9.1.2/dist/vis-network.min.js" integrity="sha512-LnvoEWDFrqGHlHmDD2101OrLcbsfkrzoSpvtSQtxK3RMnRV0eOkhhBN2dXHKRrUU8p2DGRTk35n4O8nWSVe1mQ==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
196
+
197
+
198
+
199
+
200
+
201
+
202
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/tom-select/2.0.0-rc.4/css/tom-select.min.css" integrity="sha512-43fHB3GLgZfz8QXl1RPQ8O66oIgv3po9cJ5erMt1c4QISq9dYb195T3vr5ImnJPXuVroKcGBPXBFKETW8jrPNQ==" crossorigin="anonymous" referrerpolicy="no-referrer" />
203
+ <script src="https://cdnjs.cloudflare.com/ajax/libs/tom-select/2.0.0-rc.4/js/tom-select.complete.js" integrity="sha512-jeF9CfnvzDiw9G9xiksVjxR2lib44Gnovvkv+3CgCG6NXCD4gqlA5nDAVW5WjpA+i+/zKsUWV5xNEbW1X/HH0Q==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>
204
+
205
+
206
+
207
+ <center>
208
+ <h1></h1>
209
+ </center>
210
+
211
+ <!-- <link rel="stylesheet" href="../node_modules/vis/dist/vis.min.css" type="text/css" />
212
+ <script type="text/javascript" src="../node_modules/vis/dist/vis.js"> </script>-->
213
+ <link
214
+ href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0-beta3/dist/css/bootstrap.min.css"
215
+ rel="stylesheet"
216
+ integrity="sha384-eOJMYsd53ii+scO/bJGFsiCZc+5NDVN2yr8+0RDqr0Ql0h+rP48ckxlpbzKgwra6"
217
+ crossorigin="anonymous"
218
+ />
219
+ <script
220
+ src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0-beta3/dist/js/bootstrap.bundle.min.js"
221
+ integrity="sha384-JEW9xMcG8R+pH31jmWH6WWP0WintQrMb4s7ZOdauHnUtxwoG2vI5DkLtS3qm9Ekf"
222
+ crossorigin="anonymous"
223
+ ></script>
224
+
225
+
226
+ <center>
227
+ <h1></h1>
228
+ </center>
229
+ <style type="text/css">
230
+
231
+ #mynetwork {
232
+ width: 100%;
233
+ height: 1200px;
234
+ background-color: #222222;
235
+ border: 1px solid lightgray;
236
+ position: relative;
237
+ float: left;
238
+ }
239
+
240
+
241
+
242
+
243
+
244
+
245
+ </style>
246
+ </head>
247
+
248
+
249
+ <body>
250
+ <div class="card" style="width: 100%">
251
+
252
+
253
+ <div id="filter-menu" class="card-header">
254
+ <div class="row no-gutters">
255
+ <div class="col-3 pb-2">
256
+ <select
257
+ class="form-select"
258
+ aria-label="Default select example"
259
+ onchange="updateFilter(value, 'item')"
260
+ id="select-item"
261
+ >
262
+ <option value="">Select a network item</option>
263
+ <option value="edge">edge</option>
264
+ <option value="node">node</option>
265
+ </select>
266
+ </div>
267
+ <div class="col-3 pb-2">
268
+ <select
269
+ class="form-select"
270
+ aria-label="Default select example"
271
+ onchange="updateFilter(value, 'property')"
272
+ id="select-property"
273
+ >
274
+ <option value="">Select a property...</option>
275
+ </select>
276
+ </div>
277
+ <div class="col-3 pb-2">
278
+ <select
279
+ class="form-select"
280
+ aria-label="Default select example"
281
+ id="select-value"
282
+ >
283
+ <option value="">Select value(s)...</option>
284
+ </select>
285
+ </div>
286
+ <div class="col-1 pb-2">
287
+ <button type="button" class="btn btn-primary btn-block" onclick="highlightFilter(filter);">Filter</button>
288
+ </div>
289
+ <div class="col-2 pb-2">
290
+ <button type="button" class="btn btn-primary btn-block" onclick="clearFilter(true)">Reset Selection</button>
291
+ </div>
292
+ </div>
293
+ </div>
294
+
295
+ <div id="mynetwork" class="card-body"></div>
296
+ </div>
297
+
298
+
299
+
300
+
301
+ <script type="text/javascript">
302
+
303
+ // initialize global variables.
304
+ var edges;
305
+ var nodes;
306
+ var allNodes;
307
+ var allEdges;
308
+ var nodeColors;
309
+ var originalNodes;
310
+ var network;
311
+ var container;
312
+ var options, data;
313
+ var filter = {
314
+ item : '',
315
+ property : '',
316
+ value : []
317
+ };
318
+
319
+
320
+
321
+
322
+ // explicitly using onItemAdd and this function as we need to save multiple values
323
+ let updateValueFilter = function() {
324
+ return function () {
325
+ filter['value'].push(arguments[0])
326
+ }
327
+ }
328
+
329
+ let valueControl = new TomSelect("#select-value",{
330
+ maxItems: null,
331
+ valueField: 'id',
332
+ labelField: 'title',
333
+ searchField: 'title',
334
+ create: false,
335
+ sortField: {
336
+ field: "text",
337
+ direction: "asc"
338
+ },
339
+ onItemAdd: updateValueFilter()
340
+ });
341
+
342
+ let addValues = function() {
343
+ return function () {
344
+ // clear the current value options and add the selected attribute values
345
+ // tom-select handles duplicates
346
+ let selectedProperty = arguments[0];
347
+ valueControl.clear();
348
+ valueControl.clearOptions();
349
+ filter['value'] = []
350
+ if (filter['item'] === 'node') {
351
+ for (let each in allNodes) {
352
+ valueControl.addOption({
353
+ id:allNodes[each][selectedProperty],
354
+ title:allNodes[each][selectedProperty]
355
+ })
356
+ }
357
+ }
358
+ else if (filter['item'] === 'edge') {
359
+ for (let each in allEdges) {
360
+ valueControl.addOption({
361
+ id:allEdges[each][selectedProperty],
362
+ title:allEdges[each][selectedProperty]
363
+ })
364
+ }
365
+ }
366
+ }
367
+ };
368
+
369
+ let propControl = new TomSelect("#select-property",{
370
+ valueField: 'id',
371
+ labelField: 'title',
372
+ searchField: 'title',
373
+ create: false,
374
+ sortField: {
375
+ field: "text",
376
+ direction: "asc"
377
+ },
378
+ onItemAdd: addValues()
379
+ });
380
+
381
+ let addProperties = function() {
382
+ return function () {
383
+ // loops through the selected network item and adds the attributes to dropdown
384
+ // tom-select handles duplicates
385
+ clearFilter(false)
386
+ if (arguments[0] === 'edge') {
387
+ for (let each in allEdges) {
388
+ if (allEdges.hasOwnProperty(each)) {
389
+ for (let eachProp in allEdges[each]) {
390
+ if (allEdges[each].hasOwnProperty(eachProp)) {
391
+ propControl.addOption({id: eachProp, title: eachProp})
392
+ }
393
+ }
394
+ }
395
+ }
396
+ }
397
+ else if (arguments[0] === 'node') {
398
+ for (let each in allNodes) {
399
+ if (allNodes.hasOwnProperty(each)) {
400
+ for (let eachProp in allNodes[each]) {
401
+ if (allNodes[each].hasOwnProperty(eachProp)
402
+ && (eachProp !== 'hidden' && eachProp !== 'savedLabel'
403
+ && eachProp !== 'hiddenLabel')) {
404
+ propControl.addOption({id: eachProp, title: eachProp})
405
+
406
+ }
407
+ }
408
+ }
409
+ }
410
+ }
411
+ }
412
+ };
413
+
414
+ let itemControl = new TomSelect("#select-item",{
415
+ create: false,
416
+ sortField:{
417
+ field: "text",
418
+ direction: "asc"
419
+ },
420
+ onItemAdd: addProperties()
421
+ });
422
+
423
+ function clearFilter(reset) {
424
+ // utility function to clear all the selected filter options
425
+ // if reset is set to true, the existing filter will be removed
426
+ // else, only the dropdown options are cleared
427
+ propControl.clear();
428
+ propControl.clearOptions();
429
+ valueControl.clear();
430
+ valueControl.clearOptions();
431
+ filter = {
432
+ item : '',
433
+ property : '',
434
+ value : []
435
+ }
436
+ if (reset) {
437
+ itemControl.clear();
438
+ filterHighlight({nodes: []})
439
+ }
440
+ }
441
+
442
+ function updateFilter(value, key) {
443
+ // key could be 'item' or 'property' and value is as selected in dropdown
444
+ filter[key] = value
445
+ }
446
+
447
+
448
+
449
+ // This method is responsible for drawing the graph, returns the drawn network
450
+ function drawGraph() {
451
+ var container = document.getElementById('mynetwork');
452
+
453
+
454
+
455
+ // parsing and collecting nodes and edges from the python
456
+ nodes = new vis.DataSet([{"font": {"color": "white"}, "group": "Fictional organization", "id": "Night\u0027S Watch", "label": "Night\u0027S Watch", "shape": "dot", "title": "Fictional organization"}, {"font": {"color": "white"}, "group": "Person", "id": "David Benioff", "label": "David Benioff", "shape": "dot", "title": "Person"}, {"font": {"color": "white"}, "group": "Award", "id": "Primetime Emmy Awards", "label": "Primetime Emmy Awards", "shape": "dot", "title": "Award"}, {"font": {"color": "white"}, "group": "Award", "id": "Hugo Awards", "label": "Hugo Awards", "shape": "dot", "title": "Award"}, {"font": {"color": "white"}, "group": "Person", "id": "D. B. Weiss", "label": "D. B. Weiss", "shape": "dot", "title": "Person"}, {"font": {"color": "white"}, "group": "Fictional continent", "id": "Essos", "label": "Essos", "shape": "dot", "title": "Fictional continent"}, {"font": {"color": "white"}, "group": "Organization", "id": "Hbo", "label": "Hbo", "shape": "dot", "title": "Organization"}, {"font": {"color": "white"}, "group": "Person", "id": "George R. R. Martin", "label": "George R. R. Martin", "shape": "dot", "title": "Person"}, {"font": {"color": "white"}, "group": "Television series", "id": "A Knight Of The Seven Kingdoms", "label": "A Knight Of The Seven Kingdoms", "shape": "dot", "title": "Television series"}, {"font": {"color": "white"}, "group": "Book series", "id": "A Song Of Ice And Fire", "label": "A Song Of Ice And Fire", "shape": "dot", "title": "Book series"}, {"font": {"color": "white"}, "group": "Award", "id": "Golden Globe Award For Best Television Series \u2013 Drama", "label": "Golden Globe Award For Best Television Series \u2013 Drama", "shape": "dot", "title": "Award"}, {"font": {"color": "white"}, "group": "Book", "id": "A Game Of Thrones", "label": "A Game Of Thrones", "shape": "dot", "title": "Book"}, {"font": {"color": "white"}, "group": "Television series", "id": "Game Of Thrones", "label": "Game Of Thrones", "shape": "dot", "title": "Television series"}, {"font": {"color": "white"}, "group": "Fictional object", "id": "Iron Throne", "label": "Iron Throne", "shape": "dot", "title": "Fictional object"}, {"font": {"color": "white"}, "group": "Fictional place", "id": "Seven Kingdoms", "label": "Seven Kingdoms", "shape": "dot", "title": "Fictional place"}, {"font": {"color": "white"}, "group": "Television series", "id": "House Of The Dragon", "label": "House Of The Dragon", "shape": "dot", "title": "Television series"}, {"font": {"color": "white"}, "group": "Award", "id": "Peabody Award", "label": "Peabody Award", "shape": "dot", "title": "Award"}, {"font": {"color": "white"}, "group": "Fictional continent", "id": "Westeros", "label": "Westeros", "shape": "dot", "title": "Fictional continent"}]);
457
+ edges = new vis.DataSet([{"arrows": "to", "from": "Game Of Thrones", "label": "creator", "to": "David Benioff"}, {"arrows": "to", "from": "Game Of Thrones", "label": "creator", "to": "D. B. Weiss"}, {"arrows": "to", "from": "Game Of Thrones", "label": "network", "to": "Hbo"}, {"arrows": "to", "from": "Game Of Thrones", "label": "adaptation", "to": "A Song Of Ice And Fire"}, {"arrows": "to", "from": "A Song Of Ice And Fire", "label": "author", "to": "George R. R. Martin"}, {"arrows": "to", "from": "A Song Of Ice And Fire", "label": "first_book", "to": "A Game Of Thrones"}, {"arrows": "to", "from": "Game Of Thrones", "label": "setting", "to": "Westeros"}, {"arrows": "to", "from": "Game Of Thrones", "label": "setting", "to": "Essos"}, {"arrows": "to", "from": "Game Of Thrones", "label": "focus", "to": "Iron Throne"}, {"arrows": "to", "from": "Game Of Thrones", "label": "focus", "to": "Seven Kingdoms"}, {"arrows": "to", "from": "Game Of Thrones", "label": "focus", "to": "Night\u0027S Watch"}, {"arrows": "to", "from": "Game Of Thrones", "label": "award", "to": "Primetime Emmy Awards"}, {"arrows": "to", "from": "Game Of Thrones", "label": "award", "to": "Hugo Awards"}, {"arrows": "to", "from": "Game Of Thrones", "label": "award", "to": "Peabody Award"}, {"arrows": "to", "from": "Game Of Thrones", "label": "nomination", "to": "Golden Globe Award For Best Television Series \u2013 Drama"}, {"arrows": "to", "from": "House Of The Dragon", "label": "network", "to": "Hbo"}, {"arrows": "to", "from": "A Knight Of The Seven Kingdoms", "label": "network", "to": "Hbo"}]);
458
+
459
+ nodeColors = {};
460
+ allNodes = nodes.get({ returnType: "Object" });
461
+ for (nodeId in allNodes) {
462
+ nodeColors[nodeId] = allNodes[nodeId].color;
463
+ }
464
+ allEdges = edges.get({ returnType: "Object" });
465
+ // adding nodes and edges to the graph
466
+ data = {nodes: nodes, edges: edges};
467
+
468
+ var options = {"physics": {"forceAtlas2Based": {"gravitationalConstant": -100, "centralGravity": 0.01, "springLength": 200, "springConstant": 0.08}, "minVelocity": 0.75, "solver": "forceAtlas2Based"}};
469
+
470
+
471
+
472
+
473
+
474
+
475
+ network = new vis.Network(container, data, options);
476
+
477
+
478
+
479
+
480
+
481
+
482
+
483
+
484
+
485
+
486
+ return network;
487
+
488
+ }
489
+ drawGraph();
490
+ </script>
491
+ </body>
492
+ </html>
knowledge_graph.ipynb ADDED
@@ -0,0 +1,313 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": null,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "%pip install --upgrade langchain langchain-experimental langchain-openai python-dotenv pyvis"
10
+ ]
11
+ },
12
+ {
13
+ "cell_type": "code",
14
+ "execution_count": 2,
15
+ "metadata": {},
16
+ "outputs": [],
17
+ "source": [
18
+ "from dotenv import load_dotenv\n",
19
+ "import os\n",
20
+ "\n",
21
+ "# Load the .env file\n",
22
+ "load_dotenv()\n",
23
+ "# Get API key from environment variable \n",
24
+ "# (make sure the key is present in .env file in the project directory)\n",
25
+ "api_key = os.getenv(\"OPENAI_API_KEY\")"
26
+ ]
27
+ },
28
+ {
29
+ "cell_type": "markdown",
30
+ "metadata": {},
31
+ "source": [
32
+ "### LLM Graph Transformer\n",
33
+ "Using GPT-4o in all examples."
34
+ ]
35
+ },
36
+ {
37
+ "cell_type": "code",
38
+ "execution_count": 8,
39
+ "metadata": {},
40
+ "outputs": [],
41
+ "source": [
42
+ "from langchain_experimental.graph_transformers import LLMGraphTransformer\n",
43
+ "from langchain_core.documents import Document\n",
44
+ "from langchain_openai import ChatOpenAI\n",
45
+ "\n",
46
+ "llm = ChatOpenAI(temperature=0, model_name=\"gpt-4o\")\n",
47
+ "\n",
48
+ "graph_transformer = LLMGraphTransformer(llm=llm)"
49
+ ]
50
+ },
51
+ {
52
+ "cell_type": "markdown",
53
+ "metadata": {},
54
+ "source": [
55
+ "### Extract graph data"
56
+ ]
57
+ },
58
+ {
59
+ "cell_type": "code",
60
+ "execution_count": 9,
61
+ "metadata": {},
62
+ "outputs": [],
63
+ "source": [
64
+ "text = \"\"\"\n",
65
+ "Albert Einstein[a] (14 March 1879 – 18 April 1955) was a German-born theoretical physicist who is best known for developing the theory of relativity. Einstein also made important contributions to quantum mechanics.[1][5] His mass–energy equivalence formula E = mc2, which arises from special relativity, has been called \"the world's most famous equation\".[6] He received the 1921 Nobel Prize in Physics for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect.[7]\n",
66
+ "\n",
67
+ "Born in the German Empire, Einstein moved to Switzerland in 1895, forsaking his German citizenship (as a subject of the Kingdom of Württemberg)[note 1] the following year. In 1897, at the age of seventeen, he enrolled in the mathematics and physics teaching diploma program at the Swiss federal polytechnic school in Zurich, graduating in 1900. He acquired Swiss citizenship a year later, which he kept for the rest of his life, and afterwards secured a permanent position at the Swiss Patent Office in Bern. In 1905, he submitted a successful PhD dissertation to the University of Zurich. In 1914, he moved to Berlin to join the Prussian Academy of Sciences and the Humboldt University of Berlin, becoming director of the Kaiser Wilhelm Institute for Physics in 1917; he also became a German citizen again, this time as a subject of the Kingdom of Prussia.[note 1] In 1933, while Einstein was visiting the United States, Adolf Hitler came to power in Germany. Horrified by the Nazi persecution of his fellow Jews,[8] he decided to remain in the US, and was granted American citizenship in 1940.[9] On the eve of World War II, he endorsed a letter to President Franklin D. Roosevelt alerting him to the potential German nuclear weapons program and recommending that the US begin similar research.\n",
68
+ "\n",
69
+ "In 1905, sometimes described as his annus mirabilis (miracle year), he published four groundbreaking papers.[10] In them, he outlined a theory of the photoelectric effect, explained Brownian motion, introduced his special theory of relativity, and demonstrated that if the special theory is correct, mass and energy are equivalent to each other. In 1915, he proposed a general theory of relativity that extended his system of mechanics to incorporate gravitation. A cosmological paper that he published the following year laid out the implications of general relativity for the modeling of the structure and evolution of the universe as a whole.[11][12] In 1917, Einstein wrote a paper which introduced the concepts of spontaneous emission and stimulated emission, the latter of which is the core mechanism behind the laser and maser, and which contained a trove of information that would be beneficial to developments in physics later on, such as quantum electrodynamics and quantum optics.[13]\n",
70
+ "\n",
71
+ "In the middle part of his career, Einstein made important contributions to statistical mechanics and quantum theory. Especially notable was his work on the quantum physics of radiation, in which light consists of particles, subsequently called photons. With physicist Satyendra Nath Bose, he laid the groundwork for Bose–Einstein statistics. For much of the last phase of his academic life, Einstein worked on two endeavors that ultimately proved unsuccessful. First, he advocated against quantum theory's introduction of fundamental randomness into science's picture of the world, objecting that God does not play dice.[14] Second, he attempted to devise a unified field theory by generalizing his geometric theory of gravitation to include electromagnetism. As a result, he became increasingly isolated from mainstream modern physics.\n",
72
+ "\"\"\""
73
+ ]
74
+ },
75
+ {
76
+ "cell_type": "code",
77
+ "execution_count": 21,
78
+ "metadata": {},
79
+ "outputs": [],
80
+ "source": [
81
+ "documents = [Document(page_content=text)]\n",
82
+ "graph_documents = await graph_transformer.aconvert_to_graph_documents(documents)"
83
+ ]
84
+ },
85
+ {
86
+ "cell_type": "code",
87
+ "execution_count": 22,
88
+ "metadata": {},
89
+ "outputs": [
90
+ {
91
+ "name": "stdout",
92
+ "output_type": "stream",
93
+ "text": [
94
+ "Nodes:[Node(id='Albert Einstein', type='Person', properties={}), Node(id='Theory Of Relativity', type='Concept', properties={}), Node(id='Quantum Mechanics', type='Concept', properties={}), Node(id='Mass–Energy Equivalence Formula', type='Concept', properties={}), Node(id='E = Mc2', type='Concept', properties={}), Node(id='1921 Nobel Prize In Physics', type='Award', properties={}), Node(id='Law Of The Photoelectric Effect', type='Concept', properties={}), Node(id='German Empire', type='Place', properties={}), Node(id='Switzerland', type='Place', properties={}), Node(id='Swiss Federal Polytechnic School In Zurich', type='Organization', properties={}), Node(id='Swiss Patent Office In Bern', type='Organization', properties={}), Node(id='University Of Zurich', type='Organization', properties={}), Node(id='Prussian Academy Of Sciences', type='Organization', properties={}), Node(id='Humboldt University Of Berlin', type='Organization', properties={}), Node(id='Kaiser Wilhelm Institute For Physics', type='Organization', properties={}), Node(id='United States', type='Place', properties={}), Node(id='Adolf Hitler', type='Person', properties={}), Node(id='Franklin D. Roosevelt', type='Person', properties={}), Node(id='World War Ii', type='Event', properties={}), Node(id='Annus Mirabilis', type='Event', properties={}), Node(id='Brownian Motion', type='Concept', properties={}), Node(id='Special Theory Of Relativity', type='Concept', properties={}), Node(id='General Theory Of Relativity', type='Concept', properties={}), Node(id='Spontaneous Emission', type='Concept', properties={}), Node(id='Stimulated Emission', type='Concept', properties={}), Node(id='Laser', type='Concept', properties={}), Node(id='Maser', type='Concept', properties={}), Node(id='Statistical Mechanics', type='Concept', properties={}), Node(id='Quantum Theory', type='Concept', properties={}), Node(id='Quantum Physics Of Radiation', type='Concept', properties={}), Node(id='Photons', type='Concept', properties={}), Node(id='Satyendra Nath Bose', type='Person', properties={}), Node(id='Bose–Einstein Statistics', type='Concept', properties={}), Node(id='Unified Field Theory', type='Concept', properties={}), Node(id='Electromagnetism', type='Concept', properties={})]\n",
95
+ "Relationships:[Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Theory Of Relativity', type='Concept', properties={}), type='DEVELOPED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Quantum Mechanics', type='Concept', properties={}), type='CONTRIBUTED_TO', properties={}), Relationship(source=Node(id='Mass–Energy Equivalence Formula', type='Concept', properties={}), target=Node(id='E = Mc2', type='Concept', properties={}), type='REPRESENTED_BY', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='1921 Nobel Prize In Physics', type='Award', properties={}), type='RECEIVED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Law Of The Photoelectric Effect', type='Concept', properties={}), type='DISCOVERED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='German Empire', type='Place', properties={}), type='BORN_IN', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Switzerland', type='Place', properties={}), type='MOVED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Swiss Federal Polytechnic School In Zurich', type='Organization', properties={}), type='ENROLLED_IN', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Swiss Patent Office In Bern', type='Organization', properties={}), type='WORKED_AT', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='University Of Zurich', type='Organization', properties={}), type='SUBMITTED_PHD_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Prussian Academy Of Sciences', type='Organization', properties={}), type='JOINED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Humboldt University Of Berlin', type='Organization', properties={}), type='JOINED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Kaiser Wilhelm Institute For Physics', type='Organization', properties={}), type='DIRECTOR_OF', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='United States', type='Place', properties={}), type='MOVED_TO', properties={}), Relationship(source=Node(id='Adolf Hitler', type='Person', properties={}), target=Node(id='Germany', type='Place', properties={}), type='CAME_TO_POWER_IN', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Franklin D. Roosevelt', type='Person', properties={}), type='ENDORSED_LETTER_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='World War Ii', type='Event', properties={}), type='RELATED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Annus Mirabilis', type='Event', properties={}), type='ASSOCIATED_WITH', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Brownian Motion', type='Concept', properties={}), type='EXPLAINED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Special Theory Of Relativity', type='Concept', properties={}), type='INTRODUCED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='General Theory Of Relativity', type='Concept', properties={}), type='PROPOSED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Spontaneous Emission', type='Concept', properties={}), type='INTRODUCED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Stimulated Emission', type='Concept', properties={}), type='INTRODUCED', properties={}), Relationship(source=Node(id='Stimulated Emission', type='Concept', properties={}), target=Node(id='Laser', type='Concept', properties={}), type='CORE_MECHANISM_OF', properties={}), Relationship(source=Node(id='Stimulated Emission', type='Concept', properties={}), target=Node(id='Maser', type='Concept', properties={}), type='CORE_MECHANISM_OF', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Statistical Mechanics', type='Concept', properties={}), type='CONTRIBUTED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Quantum Theory', type='Concept', properties={}), type='CONTRIBUTED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Quantum Physics Of Radiation', type='Concept', properties={}), type='WORKED_ON', properties={}), Relationship(source=Node(id='Quantum Physics Of Radiation', type='Concept', properties={}), target=Node(id='Photons', type='Concept', properties={}), type='CONSISTS_OF', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Satyendra Nath Bose', type='Person', properties={}), type='COLLABORATED_WITH', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Bose–Einstein Statistics', type='Concept', properties={}), type='LAID_GROUNDWORK_FOR', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Unified Field Theory', type='Concept', properties={}), type='ATTEMPTED_TO_DEVISE', properties={}), Relationship(source=Node(id='Unified Field Theory', type='Concept', properties={}), target=Node(id='Electromagnetism', type='Concept', properties={}), type='INCLUDES', properties={})]\n"
96
+ ]
97
+ }
98
+ ],
99
+ "source": [
100
+ "print(f\"Nodes:{graph_documents[0].nodes}\")\n",
101
+ "print(f\"Relationships:{graph_documents[0].relationships}\")"
102
+ ]
103
+ },
104
+ {
105
+ "cell_type": "markdown",
106
+ "metadata": {},
107
+ "source": [
108
+ "#### Visualize graph"
109
+ ]
110
+ },
111
+ {
112
+ "cell_type": "code",
113
+ "execution_count": 39,
114
+ "metadata": {},
115
+ "outputs": [
116
+ {
117
+ "name": "stdout",
118
+ "output_type": "stream",
119
+ "text": [
120
+ "Graph saved to /Users/thuvu/Documents/vlogging/Research/knowledge_graph_app/knowledge_graph.html\n"
121
+ ]
122
+ }
123
+ ],
124
+ "source": [
125
+ "from pyvis.network import Network\n",
126
+ "\n",
127
+ "def visualize_graph(graph_documents):\n",
128
+ "\n",
129
+ " # Create network\n",
130
+ " net = Network(height=\"1200px\", width=\"100%\", directed=True,\n",
131
+ " notebook=False, bgcolor=\"#222222\", font_color=\"white\")\n",
132
+ " \n",
133
+ " nodes = graph_documents[0].nodes\n",
134
+ " relationships = graph_documents[0].relationships\n",
135
+ "\n",
136
+ " # Build lookup for valid nodes\n",
137
+ " node_dict = {node.id: node for node in nodes}\n",
138
+ " \n",
139
+ " # Filter out invalid edges and collect valid node IDs\n",
140
+ " valid_edges = []\n",
141
+ " valid_node_ids = set()\n",
142
+ " for rel in relationships:\n",
143
+ " if rel.source.id in node_dict and rel.target.id in node_dict:\n",
144
+ " valid_edges.append(rel)\n",
145
+ " valid_node_ids.update([rel.source.id, rel.target.id])\n",
146
+ "\n",
147
+ "\n",
148
+ " # Track which nodes are part of any relationship\n",
149
+ " connected_node_ids = set()\n",
150
+ " for rel in relationships:\n",
151
+ " connected_node_ids.add(rel.source.id)\n",
152
+ " connected_node_ids.add(rel.target.id)\n",
153
+ "\n",
154
+ " # Add valid nodes\n",
155
+ " for node_id in valid_node_ids:\n",
156
+ " node = node_dict[node_id]\n",
157
+ " try:\n",
158
+ " net.add_node(node.id, label=node.id, title=node.type, group=node.type)\n",
159
+ " except:\n",
160
+ " continue # skip if error\n",
161
+ "\n",
162
+ " # Add valid edges\n",
163
+ " for rel in valid_edges:\n",
164
+ " try:\n",
165
+ " net.add_edge(rel.source.id, rel.target.id, label=rel.type.lower())\n",
166
+ " except:\n",
167
+ " continue # skip if error\n",
168
+ "\n",
169
+ " # Configure physics\n",
170
+ " net.set_options(\"\"\"\n",
171
+ " {\n",
172
+ " \"physics\": {\n",
173
+ " \"forceAtlas2Based\": {\n",
174
+ " \"gravitationalConstant\": -100,\n",
175
+ " \"centralGravity\": 0.01,\n",
176
+ " \"springLength\": 200,\n",
177
+ " \"springConstant\": 0.08\n",
178
+ " },\n",
179
+ " \"minVelocity\": 0.75,\n",
180
+ " \"solver\": \"forceAtlas2Based\"\n",
181
+ " }\n",
182
+ " }\n",
183
+ " \"\"\")\n",
184
+ " \n",
185
+ " output_file = \"knowledge_graph.html\"\n",
186
+ " net.save_graph(output_file)\n",
187
+ " print(f\"Graph saved to {os.path.abspath(output_file)}\")\n",
188
+ "\n",
189
+ " # Try to open in browser\n",
190
+ " try:\n",
191
+ " import webbrowser\n",
192
+ " webbrowser.open(f\"file://{os.path.abspath(output_file)}\")\n",
193
+ " except:\n",
194
+ " print(\"Could not open browser automatically\")\n",
195
+ " \n",
196
+ "# Run the function\n",
197
+ "visualize_graph(graph_documents)"
198
+ ]
199
+ },
200
+ {
201
+ "cell_type": "markdown",
202
+ "metadata": {},
203
+ "source": [
204
+ "### Extract specific types of nodes"
205
+ ]
206
+ },
207
+ {
208
+ "cell_type": "code",
209
+ "execution_count": 40,
210
+ "metadata": {},
211
+ "outputs": [],
212
+ "source": [
213
+ "allowed_nodes = [\"Person\", \"Organization\", \"Location\", \"Award\", \"ResearchField\"]\n",
214
+ "graph_transformer_nodes_defined = LLMGraphTransformer(llm=llm, allowed_nodes=allowed_nodes)\n",
215
+ "graph_documents_nodes_defined = await graph_transformer_nodes_defined.aconvert_to_graph_documents(documents)"
216
+ ]
217
+ },
218
+ {
219
+ "cell_type": "code",
220
+ "execution_count": 41,
221
+ "metadata": {},
222
+ "outputs": [
223
+ {
224
+ "name": "stdout",
225
+ "output_type": "stream",
226
+ "text": [
227
+ "Nodes:[Node(id='Albert Einstein', type='Person', properties={}), Node(id='German Empire', type='Location', properties={}), Node(id='Switzerland', type='Location', properties={}), Node(id='University Of Zurich', type='Organization', properties={}), Node(id='Prussian Academy Of Sciences', type='Organization', properties={}), Node(id='Humboldt University Of Berlin', type='Organization', properties={}), Node(id='Kaiser Wilhelm Institute For Physics', type='Organization', properties={}), Node(id='United States', type='Location', properties={}), Node(id='Franklin D. Roosevelt', type='Person', properties={}), Node(id='Nobel Prize In Physics', type='Award', properties={}), Node(id='Theory Of Relativity', type='Researchfield', properties={}), Node(id='Quantum Mechanics', type='Researchfield', properties={}), Node(id='Photoelectric Effect', type='Researchfield', properties={}), Node(id='Special Relativity', type='Researchfield', properties={}), Node(id='General Relativity', type='Researchfield', properties={}), Node(id='Quantum Electrodynamics', type='Researchfield', properties={}), Node(id='Quantum Optics', type='Researchfield', properties={}), Node(id='Statistical Mechanics', type='Researchfield', properties={}), Node(id='Quantum Theory', type='Researchfield', properties={}), Node(id='Unified Field Theory', type='Researchfield', properties={})]\n",
228
+ "Relationships:[Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='German Empire', type='Location', properties={}), type='BORN_IN', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Switzerland', type='Location', properties={}), type='MOVED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='University Of Zurich', type='Organization', properties={}), type='PHD_FROM', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Prussian Academy Of Sciences', type='Organization', properties={}), type='MEMBER_OF', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Humboldt University Of Berlin', type='Organization', properties={}), type='MEMBER_OF', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Kaiser Wilhelm Institute For Physics', type='Organization', properties={}), type='DIRECTOR_OF', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='United States', type='Location', properties={}), type='MOVED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Franklin D. Roosevelt', type='Person', properties={}), type='ENDORSED_LETTER_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Nobel Prize In Physics', type='Award', properties={}), type='AWARDED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Theory Of Relativity', type='Researchfield', properties={}), type='CONTRIBUTED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Quantum Mechanics', type='Researchfield', properties={}), type='CONTRIBUTED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Photoelectric Effect', type='Researchfield', properties={}), type='DISCOVERED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Special Relativity', type='Researchfield', properties={}), type='CONTRIBUTED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='General Relativity', type='Researchfield', properties={}), type='PROPOSED', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Quantum Electrodynamics', type='Researchfield', properties={}), type='CONTRIBUTED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Quantum Optics', type='Researchfield', properties={}), type='CONTRIBUTED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Statistical Mechanics', type='Researchfield', properties={}), type='CONTRIBUTED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Quantum Theory', type='Researchfield', properties={}), type='CONTRIBUTED_TO', properties={}), Relationship(source=Node(id='Albert Einstein', type='Person', properties={}), target=Node(id='Unified Field Theory', type='Researchfield', properties={}), type='ATTEMPTED', properties={})]\n"
229
+ ]
230
+ }
231
+ ],
232
+ "source": [
233
+ "print(f\"Nodes:{graph_documents_nodes_defined[0].nodes}\")\n",
234
+ "print(f\"Relationships:{graph_documents_nodes_defined[0].relationships}\")"
235
+ ]
236
+ },
237
+ {
238
+ "cell_type": "markdown",
239
+ "metadata": {},
240
+ "source": [
241
+ "### Extract specific types of relationships"
242
+ ]
243
+ },
244
+ {
245
+ "cell_type": "code",
246
+ "execution_count": 42,
247
+ "metadata": {},
248
+ "outputs": [],
249
+ "source": [
250
+ "allowed_nodes = [\"Person\", \"Organization\", \"Location\", \"Award\", \"ResearchField\"]\n",
251
+ "allowed_relationships = [\n",
252
+ " (\"Person\", \"WORKS_AT\", \"Organization\"),\n",
253
+ " (\"Person\", \"SPOUSE\", \"Person\"),\n",
254
+ " (\"Person\", \"AWARD\", \"Award\"),\n",
255
+ " (\"Organization\", \"IN_LOCATION\", \"Location\"),\n",
256
+ " (\"Person\", \"FIELD_OF_RESEARCH\", \"ResearchField\")\n",
257
+ "]\n",
258
+ "graph_transformer_rel_defined = LLMGraphTransformer(\n",
259
+ " llm=llm,\n",
260
+ " allowed_nodes=allowed_nodes,\n",
261
+ " allowed_relationships=allowed_relationships\n",
262
+ ")\n",
263
+ "graph_documents_rel_defined = await graph_transformer_rel_defined.aconvert_to_graph_documents(documents)"
264
+ ]
265
+ },
266
+ {
267
+ "cell_type": "code",
268
+ "execution_count": 43,
269
+ "metadata": {},
270
+ "outputs": [
271
+ {
272
+ "name": "stdout",
273
+ "output_type": "stream",
274
+ "text": [
275
+ "Graph saved to /Users/thuvu/Documents/vlogging/Research/knowledge_graph_app/knowledge_graph.html\n"
276
+ ]
277
+ }
278
+ ],
279
+ "source": [
280
+ "# Visualize graph\n",
281
+ "visualize_graph(graph_documents_rel_defined)"
282
+ ]
283
+ },
284
+ {
285
+ "cell_type": "code",
286
+ "execution_count": null,
287
+ "metadata": {},
288
+ "outputs": [],
289
+ "source": []
290
+ }
291
+ ],
292
+ "metadata": {
293
+ "kernelspec": {
294
+ "display_name": ".venv",
295
+ "language": "python",
296
+ "name": "python3"
297
+ },
298
+ "language_info": {
299
+ "codemirror_mode": {
300
+ "name": "ipython",
301
+ "version": 3
302
+ },
303
+ "file_extension": ".py",
304
+ "mimetype": "text/x-python",
305
+ "name": "python",
306
+ "nbconvert_exporter": "python",
307
+ "pygments_lexer": "ipython3",
308
+ "version": "3.12.6"
309
+ }
310
+ },
311
+ "nbformat": 4,
312
+ "nbformat_minor": 2
313
+ }
requirements.txt CHANGED
@@ -1,3 +1,13 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core LLM and LangChain packages
2
+ langchain>=0.1.0
3
+ langchain-experimental>=0.0.45
4
+ langchain-openai>=0.1.0
5
+
6
+ # Environment variable support
7
+ python-dotenv>=1.0.0
8
+
9
+ # Graph visualization
10
+ pyvis>=0.3.2
11
+
12
+ # Web UI
13
+ streamlit>=1.32.0
src/streamlit_app.py DELETED
@@ -1,40 +0,0 @@
1
- import altair as alt
2
- import numpy as np
3
- import pandas as pd
4
- import streamlit as st
5
-
6
- """
7
- # Welcome to Streamlit!
8
-
9
- Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
10
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
11
- forums](https://discuss.streamlit.io).
12
-
13
- In the meantime, below is an example of what you can do with just a few lines of code:
14
- """
15
-
16
- num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
17
- num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
18
-
19
- indices = np.linspace(0, 1, num_points)
20
- theta = 2 * np.pi * num_turns * indices
21
- radius = indices
22
-
23
- x = radius * np.cos(theta)
24
- y = radius * np.sin(theta)
25
-
26
- df = pd.DataFrame({
27
- "x": x,
28
- "y": y,
29
- "idx": indices,
30
- "rand": np.random.randn(num_points),
31
- })
32
-
33
- st.altair_chart(alt.Chart(df, height=700, width=700)
34
- .mark_point(filled=True)
35
- .encode(
36
- x=alt.X("x", axis=None),
37
- y=alt.Y("y", axis=None),
38
- color=alt.Color("idx", legend=None, scale=alt.Scale()),
39
- size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
40
- ))