Spaces:
Running
A newer version of the Streamlit SDK is available:
1.51.0
Reference: textgraphs package
Package definitions for the `TextGraphs` library.
see copyright/license https://huggingface.co/spaces/DerwenAI/textgraphs/blob/main/README.md
TextGraphs class
Construct a lemma graph from the unstructured text source,
then extract ranked phrases using a textgraph algorithm.
infer_relations_async method
infer_relations_async(pipe, debug=False)
Gather triples representing inferred relations and build edges, concurrently by running an async queue. https://stackoverflow.com/questions/52582685/using-asyncio-queue-for-producer-consumer-flow
Make sure to call beforehand: TextGraphs.collect_graph_elements()
pipe:textgraphs.pipe.Pipeline
configured pipeline for this documentdebug:bool
debugging flagreturns :
typing.List[textgraphs.elem.Edge]
a list of the inferredEdgeobjects
__init__ method
__init__(factory=None, iri_base="https://github.com/DerwenAI/textgraphs/ns/")
Constructor.
factory:typing.Optional[textgraphs.pipe.PipelineFactory]
optionalPipelineFactoryused to configure components
create_pipeline method
create_pipeline(text_input)
Use the pipeline factory to create a pipeline (e.g., spaCy.Document)
for each text input, which are typically paragraph-length.
text_input:str
raw text to be parsed by this pipelinereturns :
textgraphs.pipe.Pipeline
a configured pipeline
create_render method
create_render()
Create an object for rendering the graph in PyVis HTML+JavaScript.
- returns :
textgraphs.vis.RenderPyVis
a configuredRenderPyVisobject for generating graph visualizations
collect_graph_elements method
collect_graph_elements(pipe, text_id=0, para_id=0, debug=False)
Collect the elements of a lemma graph from the results of running
the textgraph algorithm. These elements include: parse dependencies,
lemmas, entities, and noun chunks.
Make sure to call beforehand: TextGraphs.create_pipeline()
pipe:textgraphs.pipe.Pipeline
configured pipeline for this documenttext_id:int
text (top-level document) identifierpara_id:int
paragraph identitiferdebug:bool
debugging flag
construct_lemma_graph method
construct_lemma_graph(debug=False)
Construct the base level of the lemma graph from the collected
elements. This gets represented in NetworkX as a directed graph
with parallel edges.
Make sure to call beforehand: TextGraphs.collect_graph_elements()
debug:bool
debugging flag
perform_entity_linking method
perform_entity_linking(pipe, debug=False)
Perform entity linking based on the KnowledgeGraph object.
Make sure to call beforehand: TextGraphs.collect_graph_elements()
pipe:textgraphs.pipe.Pipeline
configured pipeline for this documentdebug:bool
debugging flag
infer_relations method
infer_relations(pipe, debug=False)
Gather triples representing inferred relations and build edges.
Make sure to call beforehand: TextGraphs.collect_graph_elements()
pipe:textgraphs.pipe.Pipeline
configured pipeline for this documentdebug:bool
debugging flagreturns :
typing.List[textgraphs.elem.Edge]
a list of the inferredEdgeobjects
calc_phrase_ranks method
calc_phrase_ranks(pr_alpha=0.85, debug=False)
Calculate the weights for each node in the lemma graph, then stack-rank the nodes so that entities have priority over lemmas.
Phrase ranks are normalized to sum to 1.0 and these now represent the ranked entities extracted from the document.
Make sure to call beforehand: TextGraphs.construct_lemma_graph()
pr_alpha:float
optionalalphaparameter for the PageRank algorithmdebug:bool
debugging flag
get_phrases method
get_phrases()
Return the entities extracted from the document.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
- yields :
extracted entities
get_phrases_as_df method
get_phrases_as_df()
Return the ranked extracted entities as a dataframe.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
- returns :
pandas.core.frame.DataFrame
apandas.DataFrameof the extracted entities
export_rdf method
export_rdf(lang="en")
Extract the entities and relations which have IRIs as RDF triples.
lang:str
language identifierreturns :
str
RDF triples N3 (Turtle) format as a string
denormalize_iri method
denormalize_iri(uri_ref)
Discern between a parsed entity and a linked entity.
- returns :
str
lemma_key for a parsed entity, the full IRI for a linked entity
load_bootstrap_ttl method
load_bootstrap_ttl(ttl_str, debug=False)
Parse a TTL string with an RDF semantic graph representation to load bootstrap definitions for the lemma graph prior to parsing, e.g., for synonyms.
ttl_str:str
RDF triples in TTL (Turtle/N3) formatdebug:bool
debugging flag
export_kuzu method
export_kuzu(zip_name="lemma.zip", debug=False)
Export a labeled property graph for KùzuDB (openCypher).
debug:bool
debugging flagreturns :
str
name of the generated ZIP file
SimpleGraph class
An in-memory graph used to build a MultiDiGraph in NetworkX.
__init__ method
__init__()
Constructor.
reset method
reset()
Re-initialize the data structures, resetting all but the configuration.
make_node method
make_node(tokens, key, span, kind, text_id, para_id, sent_id, label=None, length=1, linked=True)
Lookup and return a Node object.
By default, link matching keys into the same node.
Otherwise instantiate a new node if it does not exist already.
tokens:typing.List[textgraphs.elem.Node]
list of parsed tokenskey:str
lemma key (invariant)span:spacy.tokens.token.Token
token span for the parsed entitykind:<enum 'NodeEnum'>
the kind of thisNodeobjecttext_id:int
text (top-level document) identifierpara_id:int
paragraph identitifersent_id:int
sentence identifierlabel:typing.Optional[str]
node label (for a new object)length:int
length of token spanlinked:bool
flag for whether this links to an entityreturns :
textgraphs.elem.Node
the constructedNodeobject
make_edge method
make_edge(src_node, dst_node, kind, rel, prob, key=None, debug=False)
Lookup an edge, creating a new one if it does not exist already, and increment the count if it does.
src_node:textgraphs.elem.Node
source node in the tripledst_node:textgraphs.elem.Node
destination node in the triplekind:<enum 'RelEnum'>
the kind of thisEdgeobjectrel:str
relation labelprob:float
probability of thisEdgewithin the graphkey:typing.Optional[str]
lemma key (invariant); generate a key if this is not provideddebug:bool
debugging flagreturns :
typing.Optional[textgraphs.elem.Edge]
the constructedEdgeobject; this may beNoneif the input parameters indicate skipping the edge
dump_lemma_graph method
dump_lemma_graph()
Dump the lemma graph as a JSON string in node-link format, suitable for serialization and subsequent use in JavaScript, Neo4j, Graphistry, etc.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
- returns :
str
a JSON representation of the exported lemma graph in
load_lemma_graph method
load_lemma_graph(json_str, debug=False)
Load from a JSON string in a JSON representation of the exported lemma graph in node-link format
debug:bool
debugging flag
Node class
A data class representing one node, i.e., an extracted phrase.
__repr__ method
__repr__()
get_linked_label method
get_linked_label()
When this node has a linked entity, return that IRI.
Otherwise return its label value.
- returns :
typing.Optional[str]
a label for the linked entity
get_name method
get_name()
Return a brief name for the graphical depiction of this Node.
- returns :
str
brief label to be used in a graph
get_stacked_count method
get_stacked_count()
Return a modified count, to redact verbs and linked entities from the stack-rank partitions.
- returns :
int
count, used for re-ranking extracted entities
get_pos method
get_pos()
Generate a position span for OpenNRE.
- returns :
typing.Tuple[int, int]
a position span needed forOpenNRErelation extraction
Edge class
A data class representing an edge between two nodes.
__repr__ method
__repr__()
EnumBase class
A mixin for Enum codecs.
NodeEnum class
Enumeration for the kinds of node categories
RelEnum class
Enumeration for the kinds of edge relations
PipelineFactory class
Factory pattern for building a pipeline, which is one of the more
expensive operations with spaCy
__init__ method
__init__(spacy_model="en_core_web_sm", ner=None, kg=<textgraphs.pipe.KnowledgeGraph object at 0x130529960>, infer_rels=[])
Constructor which instantiates the spaCy pipelines:
tok_pipe-- regular generator for parsed tokensner_pipe-- with entities mergedaux_pipe-- spotlight entity linking
which will be needed for parsing and entity linking.
spacy_model:str
the specific model to use inspaCypipelinesner:typing.Optional[textgraphs.pipe.Component]
optional custom NER componentkg:textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linkinginfer_rels:typing.List[textgraphs.pipe.InferRel]
a list of components for inferring relations
create_pipeline method
create_pipeline(text_input)
Instantiate the document pipelines needed to parse the input text.
text_input:str
raw text to be parsedreturns :
textgraphs.pipe.Pipeline
a configuredPipelineobject
Pipeline class
Manage parsing of a document, which is assumed to be paragraph-sized.
__init__ method
__init__(text_input, tok_pipe, ner_pipe, aux_pipe, kg, infer_rels)
Constructor.
text_input:str
raw text to be parsedtok_pipe:spacy.language.Language
thespaCy.Languagepipeline used for tallying individual tokensner_pipe:spacy.language.Language
thespaCy.Languagepipeline used for tallying named entitiesaux_pipe:spacy.language.Language
thespaCy.Languagepipeline used for auxiliary components (e.g.,DBPedia Spotlight)kg:textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linkinginfer_rels:typing.List[textgraphs.pipe.InferRel]
a list of components for inferring relations
get_lemma_key classmethod
get_lemma_key(span, placeholder=False)
Compose a unique, invariant lemma key for the given span.
span:typing.Union[spacy.tokens.span.Span, spacy.tokens.token.Token]
span of tokens within the lemmaplaceholder:bool
flag for whether to create a placeholderreturns :
str
a composed lemma key
get_ent_lemma_keys method
get_ent_lemma_keys()
Iterate through the fully qualified lemma keys for an extracted entity.
- yields :
the lemma keys within an extracted entity
link_noun_chunks method
link_noun_chunks(nodes, debug=False)
Link any noun chunks which are not already subsumed by named entities.
nodes:dict
dictionary ofNodeobjects in the graphdebug:bool
debugging flagreturns :
typing.List[textgraphs.elem.NounChunk]
a list of identified noun chunks which are novel
iter_entity_pairs method
iter_entity_pairs(pipe_graph, max_skip, debug=True)
Iterator for entity pairs for which the algorithm infers relations.
pipe_graph:networkx.classes.multigraph.MultiGraph
anetworkx.MultiGraphrepresentation of the graph, reused for graph algorithmsmax_skip:int
maximum distance between entities for inferred relationsdebug:bool
debugging flagyields :
pairs of entities within a range, e.g., to use for relation extraction
Component class
Abstract base class for a spaCy pipeline component.
augment_pipe method
augment_pipe(factory)
Encapsulate a spaCy call to add_pipe() configuration.
factory:PipelineFactory
aPipelineFactoryused to configure components
NERSpanMarker class
Configures a spaCy pipeline component for SpanMarkerNER
__init__ method
__init__(ner_model="tomaarsen/span-marker-roberta-large-ontonotes5")
Constructor.
ner_model:str
model to be used inSpanMarker
augment_pipe method
augment_pipe(factory)
Encapsulate a spaCy call to add_pipe() configuration.
factory:textgraphs.pipe.PipelineFactory
thePipelineFactoryused to configure this pipeline component
NounChunk class
A data class representing one noun chunk, i.e., a candidate as an extracted phrase.
__repr__ method
__repr__()
KnowledgeGraph class
Base class for a knowledge graph interface.
augment_pipe method
augment_pipe(factory)
Encapsulate a spaCy call to add_pipe() configuration.
factory:PipelineFactory
aPipelineFactoryused to configure components
remap_ner method
remap_ner(label)
Remap the OntoTypes4 values from NER output to more general-purpose IRIs.
label:typing.Optional[str]
input NER label, anOntoTypes4valuereturns :
typing.Optional[str]
an IRI for the named entity
normalize_prefix method
normalize_prefix(iri, debug=False)
Normalize the given IRI to use standard namespace prefixes.
iri:str
input IRI, in fully-qualified domain representationdebug:bool
debugging flagreturns :
str
the compact IRI representation, using an RDF namespace prefix
perform_entity_linking method
perform_entity_linking(graph, pipe, debug=False)
Perform entity linking based on "spotlight" and other services.
graph:textgraphs.graph.SimpleGraph
source graphpipe:Pipeline
configured pipeline for the current documentdebug:bool
debugging flag
resolve_rel_iri method
resolve_rel_iri(rel, lang="en", debug=False)
Resolve a rel string from a relation extraction model which has
been trained on this knowledge graph.
rel:str
relation label, generation these source from Wikidata for many RE projectslang:str
language identifierdebug:bool
debugging flagreturns :
typing.Optional[str]
a resolved IRI
KGSearchHit class
A data class representing a hit from a knowledge graph search.
__repr__ method
__repr__()
KGWikiMedia class
Manage access to WikiMedia-related APIs.
__init__ method
__init__(spotlight_api="https://api.dbpedia-spotlight.org/en", dbpedia_search_api="https://lookup.dbpedia.org/api/search", dbpedia_sparql_api="https://dbpedia.org/sparql", wikidata_api="https://www.wikidata.org/w/api.php", ner_map=OrderedDict([('CARDINAL', {'iri': 'http://dbpedia.org/resource/Cardinal_number', 'definition': 'Numerals that do not fall under another type', 'label': 'cardinal number'}), ('DATE', {'iri': 'http://dbpedia.org/ontology/date', 'definition': 'Absolute or relative dates or periods', 'label': 'date'}), ('EVENT', {'iri': 'http://dbpedia.org/ontology/Event', 'definition': 'Named hurricanes, battles, wars, sports events, etc.', 'label': 'event'}), ('FAC', {'iri': 'http://dbpedia.org/ontology/Infrastructure', 'definition': 'Buildings, airports, highways, bridges, etc.', 'label': 'infrastructure'}), ('GPE', {'iri': 'http://dbpedia.org/ontology/Country', 'definition': 'Countries, cities, states', 'label': 'country'}), ('LANGUAGE', {'iri': 'http://dbpedia.org/ontology/Language', 'definition': 'Any named language', 'label': 'language'}), ('LAW', {'iri': 'http://dbpedia.org/ontology/Law', 'definition': 'Named documents made into laws', 'label': 'law'}), ('LOC', {'iri': 'http://dbpedia.org/ontology/Place', 'definition': 'Non-GPE locations, mountain ranges, bodies of water', 'label': 'place'}), ('MONEY', {'iri': 'http://dbpedia.org/resource/Money', 'definition': 'Monetary values, including unit', 'label': 'money'}), ('NORP', {'iri': 'http://dbpedia.org/ontology/nationality', 'definition': 'Nationalities or religious or political groups', 'label': 'nationality'}), ('ORDINAL', {'iri': 'http://dbpedia.org/resource/Ordinal_number', 'definition': 'Ordinal number, i.e., first, second, etc.', 'label': 'ordinal number'}), ('ORG', {'iri': 'http://dbpedia.org/ontology/Organisation', 'definition': 'Companies, agencies, institutions, etc.', 'label': 'organization'}), ('PERCENT', {'iri': 'http://dbpedia.org/resource/Percentage', 'definition': 'Percentage', 'label': 'percentage'}), ('PERSON', {'iri': 'http://dbpedia.org/ontology/Person', 'definition': 'People, including fictional', 'label': 'person'}), ('PRODUCT', {'iri': 'http://dbpedia.org/ontology/product', 'definition': 'Vehicles, weapons, foods, etc. (Not services)', 'label': 'product'}), ('QUANTITY', {'iri': 'http://dbpedia.org/resource/Quantity', 'definition': 'Measurements, as of weight or distance', 'label': 'quantity'}), ('TIME', {'iri': 'http://dbpedia.org/ontology/time', 'definition': 'Times smaller than a day', 'label': 'time'}), ('WORK OF ART', {'iri': 'http://dbpedia.org/resource/Work_of_art', 'definition': 'Titles of books, songs, etc.', 'label': 'work of art'})]), ns_prefix=OrderedDict([('dbc', 'http://dbpedia.org/resource/Category:'), ('dbt', 'http://dbpedia.org/resource/Template:'), ('dbr', 'http://dbpedia.org/resource/'), ('yago', 'http://dbpedia.org/class/yago/'), ('dbd', 'http://dbpedia.org/datatype/'), ('dbo', 'http://dbpedia.org/ontology/'), ('dbp', 'http://dbpedia.org/property/'), ('units', 'http://dbpedia.org/units/'), ('dbpedia-commons', 'http://commons.dbpedia.org/resource/'), ('dbpedia-wikicompany', 'http://dbpedia.openlinksw.com/wikicompany/'), ('dbpedia-wikidata', 'http://wikidata.dbpedia.org/resource/'), ('wd', 'http://www.wikidata.org/'), ('wd_ent', 'http://www.wikidata.org/entity/'), ('rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'), ('schema', 'https://schema.org/'), ('owl', 'http://www.w3.org/2002/07/owl#')]), min_alias=0.8, min_similarity=0.9)
Constructor.
spotlight_api:strDBPedia SpotlightAPI or equivalent local servicedbpedia_search_api:strDBPedia SearchAPI or equivalent local servicedbpedia_sparql_api:strDBPedia SPARQLAPI or equivalent local servicewikidata_api:strWikidata SearchAPI or equivalent local servicener_map:dict
named entity map for standardizing IRIsns_prefix:dict
RDF namespace prefixesmin_alias:float
minimum alias probability threshold for accepting linked entitiesmin_similarity:float
minimum label similarity threshold for accepting linked entities
augment_pipe method
augment_pipe(factory)
Encapsulate a spaCy call to add_pipe() configuration.
factory:textgraphs.pipe.PipelineFactory
aPipelineFactoryused to configure components
remap_ner method
remap_ner(label)
Remap the OntoTypes4 values from NER output to more general-purpose IRIs.
label:typing.Optional[str]
input NER label, anOntoTypes4valuereturns :
typing.Optional[str]
an IRI for the named entity
normalize_prefix method
normalize_prefix(iri, debug=False)
Normalize the given IRI using the standard DBPedia namespace prefixes.
iri:str
input IRI, in fully-qualified domain representationdebug:bool
debugging flagreturns :
str
the compact IRI representation, using an RDF namespace prefix
perform_entity_linking method
perform_entity_linking(graph, pipe, debug=False)
Perform entity linking based on DBPedia Spotlight and other services.
graph:textgraphs.graph.SimpleGraph
source graphpipe:textgraphs.pipe.Pipeline
configured pipeline for the current documentdebug:bool
debugging flag
resolve_rel_iri method
resolve_rel_iri(rel, lang="en", debug=False)
Resolve a rel string from a relation extraction model which has
been trained on this knowledge graph, which defaults to using the
WikiMedia graphs.
rel:str
relation label, generation these source from Wikidata for many RE projectslang:str
language identifierdebug:bool
debugging flagreturns :
typing.Optional[str]
a resolved IRI
wikidata_search method
wikidata_search(query, lang="en", debug=False)
Query the Wikidata search API.
query:str
query stringlang:str
language identifierdebug:bool
debugging flagreturns :
typing.Optional[textgraphs.elem.KGSearchHit]
search hit, if any
dbpedia_search_entity method
dbpedia_search_entity(query, lang="en", debug=False)
Perform a DBPedia API search.
query:str
query stringlang:str
language identifierdebug:bool
debugging flagreturns :
typing.Optional[textgraphs.elem.KGSearchHit]
search hit, if any
dbpedia_sparql_query method
dbpedia_sparql_query(sparql, debug=False)
Perform a SPARQL query on DBPedia.
sparql:str
SPARQL query stringdebug:bool
debugging flagreturns :
dict
dictionary of query results
dbpedia_wikidata_equiv method
dbpedia_wikidata_equiv(dbpedia_iri, debug=False)
Perform a SPARQL query on DBPedia to find an equivalent Wikidata entity.
dbpedia_iri:str
IRI in DBpediadebug:bool
debugging flagreturns :
typing.Optional[str]
equivalent IRI in Wikidata
LinkedEntity class
A data class representing one linked entity.
__repr__ method
__repr__()
InferRel class
Abstract base class for a relation extraction model wrapper.
gen_triples_async method
gen_triples_async(pipe, queue, debug=False)
Infer relations as triples produced to a queue concurrently.
pipe:Pipeline
configured pipeline for the current documentqueue:asyncio.queues.Queue
queue of inference tasks to be performeddebug:bool
debugging flag
gen_triples method
gen_triples(pipe, debug=False)
Infer relations as triples through a generator iteratively.
pipe:Pipeline
configured pipeline for the current documentdebug:bool
debugging flagyields :
generated triples
InferRel_OpenNRE class
Perform relation extraction based on the OpenNRE model.
https://github.com/thunlp/OpenNRE
__init__ method
__init__(model="wiki80_cnn_softmax", max_skip=11, min_prob=0.9)
Constructor.
model:str
the specific model to be used inOpenNREmax_skip:int
maximum distance between entities for inferred relationsmin_prob:float
minimum probability threshold for accepting an inferred relation
gen_triples method
gen_triples(pipe, debug=False)
Iterate on entity pairs to drive OpenNRE, inferring relations
represented as triples which get produced by a generator.
pipe:textgraphs.pipe.Pipeline
configured pipeline for the current documentdebug:bool
debugging flagyields :
generated triples as candidates for inferred relations
InferRel_Rebel class
Perform relation extraction based on the REBEL model.
https://github.com/Babelscape/rebel
https://huggingface.co/spaces/Babelscape/mrebel-demo
__init__ method
__init__(lang="en_XX", mrebel_model="Babelscape/mrebel-large")
Constructor.
lang:str
language identifiermrebel_model:str
tokenizer model to be used
tokenize_sent method
tokenize_sent(text)
Apply the tokenizer manually, since we need to extract special tokens.
text:str
input text for the sentence to be tokenizedreturns :
str
extracted tokens
extract_triplets_typed method
extract_triplets_typed(text)
Parse the generated text and extract its triplets.
text:str
input text for the sentence to use in inferencereturns :
list
a list of extracted triples
gen_triples method
gen_triples(pipe, debug=False)
Drive REBEL to infer relations for each sentence, represented as
triples which get produced by a generator.
pipe:textgraphs.pipe.Pipeline
configured pipeline for the current documentdebug:bool
debugging flagyields :
generated triples as candidates for inferred relations
RenderPyVis class
Render the lemma graph as a PyVis network.
__init__ method
__init__(graph, kg)
Constructor.
graph:textgraphs.graph.SimpleGraph
source graph to be visualizedkg:textgraphs.pipe.KnowledgeGraph
knowledge graph used for entity linking
render_lemma_graph method
render_lemma_graph(debug=True)
Prepare the structure of the NetworkX graph to use for building
and returning a PyVis network to render.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
debug:bool
debugging flagreturns :
pyvis.network.Network
<apyvis.network.Networkinteractive visualization
draw_communities method
draw_communities(spring_distance=1.4, debug=False)
Cluster the communities in the lemma graph, then draw a
NetworkX graph of the notes with a specific color for each
community.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
spring_distance:floatNetworkXparameter used to separate clusters visuallydebug:bool
debugging flagreturns :
typing.Dict[int, int]
a map of the calculated communities
generate_wordcloud method
generate_wordcloud(background="black")
Generate a tag cloud from the given phrases.
Make sure to call beforehand: TextGraphs.calc_phrase_ranks()
background:str
background color for the renderingreturns :
wordcloud.wordcloud.WordCloud
the rendering as awordcloud.WordCloudobject, which can be used to generate PNG images, etc.
NodeStyle class
Dataclass used for styling PyVis nodes.
__setattr__ method
__setattr__(name, value)
GraphOfRelations class
Attempt to reproduce results published in "INGRAM: Inductive Knowledge Graph Embedding via Relation Graphs" https://arxiv.org/abs/2305.19987
__init__ method
__init__(source)
Constructor.
source:textgraphs.graph.SimpleGraph
source graph to be transformed
load_ingram method
load_ingram(json_file, debug=False)
Load data for a source graph, as illustrated in lee2023ingram
json_file:pathlib.Path
path for the JSON dataset to loaddebug:bool
debugging flag
seeds method
seeds(debug=False)
Prep data for the topological transform illustrated in lee2023ingram
debug:bool
debugging flag
trace_source_graph method
trace_source_graph()
Output a "seed" representation of the source graph.
construct_gor method
construct_gor(debug=False)
Perform the topological transform described by lee2023ingram, constructing a graph of relations (GOR) and calculating affinity scores between entities in the GOR based on their definitions:
we measure the affinity between two relations by considering how many entities are shared between them and how frequently they share the same entity
debug:bool
debugging flag
tally_frequencies classmethod
tally_frequencies(counter)
Tally the frequency of shared entities.
counter:collections.Countercounterdata collection for the rel_b/entity pairsreturns :
int
tallied values for one relation
get_affinity_scores method
get_affinity_scores(debug=False)
Reproduce metrics based on the example published in lee2023ingram
debug:bool
debugging flagreturns :
typing.Dict[tuple, float]
the calculated affinity scores
trace_metrics method
trace_metrics(scores)
Compare the calculated affinity scores with results from a published example.
scores:typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values)returns :
pandas.core.frame.DataFrame
apandas.DataFramewhere the rows compare expected vs. observed affinity scores
render_gor_plt method
render_gor_plt(scores)
Visualize the graph of relations using matplotlib
scores:typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values)
render_gor_pyvis method
render_gor_pyvis(scores)
Visualize the graph of relations interactively using PyVis
scores:typing.Dict[tuple, float]
the calculated affinity scores between pairs of relations (i.e., observed values)returns :
pyvis.network.Network
apyvis.networkNetworkrepresentation of the transformed graph
TransArc class
A data class representing one transformed rel-node-rel triple in a graph of relations.
__repr__ method
__repr__()
RelDir class
Enumeration for the directions of a relation.
SheafSeed class
A data class representing a node from the source graph plus its partial edge, based on a Sheaf Theory decomposition of a graph.
__repr__ method
__repr__()
Affinity class
A data class representing the affinity scores from one entity in the transformed graph of relations.
NB: there are much more efficient ways to calculate these affinity scores using sparse tensor algebra; this approach illustrates the process -- for research and debugging.
__repr__ method
__repr__()
module functions
calc_quantile_bins function
calc_quantile_bins(num_rows)
Calculate the bins to use for a quantile stripe,
using numpy.linspace
num_rows:int
number of rows in the target dataframereturns :
numpy.ndarray
calculated bins, as anumpy.ndarray
get_repo_version function
get_repo_version()
Access the Git repository information and return items to identify the version/commit running in production.
- returns :
typing.Tuple[str, str]
version tag and commit hash
root_mean_square function
root_mean_square(values)
Calculate the root mean square of the values in the given list.
values:typing.List[float]
list of values to use in the RMS calculationreturns :
float
RMS metric as a float
stripe_column function
stripe_column(values, bins)
Stripe a column in a dataframe, by interpolating quantiles into a set of discrete indexes.
values:list
list of values to stripebins:int
quantile bins; seecalc_quantile_bins()returns :
numpy.ndarray
the striped column values, as anumpy.ndarray