Spaces:

DerwenAI
/

textgraphs

Running

App Files Files Community

textgraphs / docs /hitl.md

Paco Nathan

A new start

91eaff6 over 1 year ago

preview code

raw

history blame contribute delete

1.72 kB

	Rather than fully automatic KG construction, this approach emphasizes means of incorporating _domain experts_ through "human-in-the-loop" (HITL) techniques.

	Multiple techniques can be employed to construct gradients for both the generated nodes and edges, starting with the quantitative scores from model inference.

	- gradient for recommending extracted entities: _named entity recognition_, _textrank_, _probabilistic soft logic_, etc.
	- gradient for recommending extracted relations: _relation extraction_, _graph of relations_, etc.

	Results extracted from _lemma graphs_ provide gradients which can be leveraged to elicit feedback from domain experts:

	- high-pass filter: accept results as valid automated inference
	- low-pass filter: reject results as errors and noise

	For the results which fall in-between, a recsys or similar UI can elicit review from domain experts, based on _active learning_, _weak supervision_, etc. see <https://argilla.io/>

	subsequent to the HITL validation, the more valuable results collected within a _lemma graph_ can be extracted as the primary output from this approach.

	Based on a process of iterating through a text document in chunks, the results from one iteration can be used to bootstrap the _lemma graph_ for the next iteration. this provides a natural means of accumulating (i.e., aggregating) results from the overall analysis.

	By extension, this bootstrap/accumulation process can be used in the distributed processing of a corpus of documents, where the "data exhaust" of abstracted _lemma graphs_ used to bootstrap analysis workflows effectively becomes a _knowledge graph_, as a side-effect of the analysis.

	<img src="../assets/hitl.png" width="750" />