A newer version of the Streamlit SDK is available:
1.44.0
title: >-
Unveiling Global Narratives Through Knowledge Graphs: A Case Study Using GDELT
and Streamlit
emoji: 🔮
colorFrom: indigo
colorTo: blue
sdk: streamlit
sdk_version: 1.42.0
app_file: app.py
pinned: false
license: cc-by-4.0
short_description: using knowledge graphs for insight
Title: Unveiling Global Narratives Through Knowledge Graphs: A Case Study Using GDELT and Streamlit Keywords: GDELT, Knowledge Graphs, Network Analysis, Sentiment Analysis, Prefect, Hugging Face datasets, DuckDB, Streamlit, Neo4j, NetworkX, st-link-analysis, streamlit-aggrid, pyvis, pandas
Abstract
The global landscape is increasingly shaped by evolving narratives driven by interconnected events and entities. To better understand these dynamics, we introduce GDELT Insight Explorer, a knowledge graph-based platform built using Streamlit, DuckDB, and NetworkX. This paper presents a detailed case study on using the platform to analyze GDELT Global Knowledge Graph (GKG) data from March 2020. We focus on uncovering global narratives and relationships between actors and themes during the early phase of the COVID-19 pandemic. Our findings emphasize the utility of real-time event data visualization and network analysis in tracing narrative propagation and identifying key influencers in global events.
1. Introduction
Understanding global narratives requires tools that can capture the complexity of events, their associated entities, and evolving sentiment over time. Traditional tabular analysis methods are often insufficient for capturing these relationships at scale. Knowledge graphs offer a robust solution for modeling and visualizing the interconnected nature of real-world events. This paper documents the development and application of GDELT Insight Explorer, a platform designed to leverage GDELT data for interactive exploration and insight generation.
2. Methodology
2.1 Data Source and Processing
The application is powered by the GDELT Global Knowledge Graph (GKG) dataset, focusing on data from March 10–22, 2020. The dataset includes key features such as themes, locations, persons, organizations, and sentiment scores. Our ETL pipeline, implemented using Prefect and DuckDB, extracts and transforms the data into a Parquet format for efficient querying and filtering.
- Data Filtering: We prioritize events with a tone score below -6 to identify highly negative narratives.
- Data Storage: DuckDB is used for in-memory querying, enabling real-time analysis of filtered datasets.
- Graph Construction: NetworkX and Neo4j are employed for graph creation, with relationships categorized into entities such as persons, organizations, and locations.
2.2 Platform Architecture
The platform is built using Streamlit, with a modular architecture that supports multiple analysis modes:
- Event Navigator: Provides a tabular overview of filtered events with interactive search and filtering.
- Event Graph Explorer: Visualizes events and their associated entities in a graph format.
- Community Detection and Network Analysis: Employs NetworkX to detect communities and analyze network metrics such as centrality and density.
3. Findings
3.1 Narrative Detection and Sentiment Analysis
The negative tone filter helped identify early COVID-related narratives, revealing clusters of related events involving key global actors. By visualizing these relationships, we observed recurring themes of public health concerns and geopolitical tensions.
3.2 Community Detection
Using the Louvain method for community detection, we identified cohesive subgroups within the network. These communities often corresponded to specific geographic regions or thematic clusters, providing deeper insights into localized narratives.
3.3 Real-Time Filtering and Exploration
The integration of DuckDB allowed for seamless data filtering and exploration within the Streamlit interface. Users could drill down from high-level overviews to individual event records, facilitating rapid insight generation.
4. Conclusion and Future Work
The GDELT Insight Explorer demonstrates the potential of combining knowledge graphs and real-time data exploration for uncovering global narratives. Future work will focus on expanding the temporal range of the dataset, integrating additional data sources, and incorporating machine learning models for predictive analysis. The open-source nature of the platform encourages further development and adaptation across different domains.
References
- GDELT Project. (n.d.). https://www.gdeltproject.org
- Newman, M. E. J. (2018). Networks: An Introduction. Oxford University Press.
- DuckDB. (n.d.). https://duckdb.org
- Prefect. (n.d.). https://www.prefect.io
Appendix: Application Architecture and Code
For implementation details, please refer to the open-source repository: https://huggingface.co/spaces/dwb2023/insight.