TL;DR: TextGraphs

#1
by pacoid - opened
Derwen, Inc. org
edited Dec 2, 2023

This space uses spaCy + SpanMarkerNER to construct a lemma graph. This is a prelude to inferring the nodes, edges, properties, and probabilities for building a knowledge graph from raw unstructured text source. The open source library is used in production, though it also a provides a playground to prototype and evaluate abstractions based on "Graph Levels Of Detail".

Analysis is intended to run on a stream of paragraphs, taking into account where/how components of spaCy pipelines tend to work more efficiently and can be augmented with LLMs, graph algorithms, graph ML, etc. The process is designed to be iterative and the results are therefore cumulative.

This demo includes multiple steps:

  1. use spaCy to parse a document, with SpanMarkerNER LLM assist
  2. build a lemma graph in NetworkX from the parse results
  3. use OpenNRE to infer relations among entities (optional)
  4. use DBPedia Spotlight to perform entity linking and some graph inference.
  5. run a modified textrank algorithm plus graph analytics
  6. approximate a pareto archive (hypervolume) to re-rank extracted entities
  7. visualize the lemma graph interactively in PyVis
  8. cluster communities within the lemma graph
  9. apply topological transforms to enhance embeddings (in progress)
  10. run graph representation learning on the graph of relations (in progress)

One important insight (based on following the textgraph research community for the past ~15 years or so) is that having an domain-specific knowledge graph available a priori for sampling during the parse (e.g., for semantic field random walks provides multiple benefits:

  • faster/better convergence for extracting and ranking the key phrases in a raw text
  • entity linking as a by-product of NLP parsing
  • big steps toward semi-automated knowledge graph construction from large collections of unstructured text sources

To these ends, this library is exploring the use of graph foundation models -- on the resulting lemma graph to augment approaches graph representation learning, as a step toward providing graph levels of detail.

Overall, the outcomes for this library include ranked extracted key phrases plus a graph which can be used to construct or augment a knowledge graph.

pacoid pinned discussion
pacoid unpinned discussion
pacoid changed discussion status to closed

Sign up or log in to comment