textgraphs / docs /objectives.md
Paco Nathan
A new start
91eaff6
Consider three classes of composable elements which are needed for constructing KGs: *nodes*, *edges*, *properties*.
Several areas of machine learning (ML) research can be leveraged to generate these elements from unstructured text sources:
- nodes: NER, node prediction
- edges: relation extraction (RE), semantic inference, link prediction
- properties: NLP parse, entity linking, graph analytics
Weights or probabilities from the analysis can also be used to construct *gradients* for ranking each class of elements in the generated output.
This supports multiple approaches for filtering, selection, and abstraction of the generated composable elements, and helps incorporate domain expertise.
A set of questions follows from this line of inquiry:
**RQ1**: can workflows be defined which integrate LLM-based components and generate _composable elements_ for KGs, while managing the quality of the generated results?
**RQ2**: can topological analysis and decomposition of graph data help inform better ways to generating graph elements, e.g., by leveraging patterns within graphs (network motifs) and graph abstraction layers?
**RQ3**: where might it be possible to improve data quality for -- training data, benchmarks, evals, etc. -- then iterate to train more effective LLM-based components?
**RQ4**: how can consistent evaluations of open source related to ML research be made, assessing opportunities for reusing code in production-quality libraries?