Towards Neuro-Symbolic Language Understanding

At Flexudy, we look for ways to unify symbolic and sub-symbolic methods to improve model interpretation and inference.

Problem

Word embeddings are awesome 🚀. However, no one really knows what an array of 768 numbers means?
Text/Token classification is also awesome ❤️‍. Still, classifying things into a finite set of concepts is rather limited.
Last but not least, how do I know that the word cat is a mammal and also an animal if my neural network is only trained to predict whether something is an animal or not?

Solution

It would be cool if my neural network would just know that cat is an animal right? ∀x.Cat(x) ⇒ Animal(x). Or for example, (∀x.SchöneBlumen(x) ⇒ Blumen(x)) -- English meaning: For all x, If x is a beautiful flower, then x is still a flower. --
All of a sudden, tasks like Question Answering, Summarization, Named Entity Recognition or even Intent Classification etc become easier right?

Well, one might probably still need time to build a good and robust solution that is not as large as GPT3.

Like Peter Gärdenfors, author of conceptual spaces, we are trying to find ways to navigate between the symbolic and the sub-symbolic by thinking in concepts.

Should such a solution exist, one could easily leverage true logical reasoning engines on natural language.

How awesome would that be? 💡

Flexudy's Conceptor

We developed a poor man's implementation of the ideal solution described above.
Though it is a poor man's model, it is still a useful one 🤗.

Usage

No library should anyone suffer. Especially not if it is built on top of 🤗 HF Transformers.

Go to the Github repo

pip install git+https://github.com/flexudy/natural-language-logic.git@v0.0.1

from flexudy.conceptor.start import FlexudyConceptInferenceMachineFactory

# Load me only once
concept_inference_machine = FlexudyConceptInferenceMachineFactory.get_concept_inference_machine()

# A list of terms.
terms = ["cat", "dog", "economics and sociology", "public company"]

# If you don't pass the language, a language detector will attempt to predict it for you
# If any error occurs, the language defaults to English.
language = "en"

# Predict concepts
# You can also pass the batch_size=2 and the beam_size=4
concepts = concept_inference_machine.infer_concepts(terms, language=language)

Output:

{'cat': ['mammal', 'animal'], 'dog': ['hound', 'animal'], 'economics and sociology': ['both fields of study'], 'public company': ['company']}

How was it trained?

Using Google's T5-base and T5-small. Both models are released on the Hugging Face Hub.
T5-base was trained for only two epochs while T5-small was trained for 5 epochs.

Where did you get the data?

I extracted and curated a fragment of Conceptnet
In particular, only the IsA relation was used.
Note that one term can belong to multiple concepts (which is pretty cool if you think about Fuzzy Description Logics). Multiple inheritances however mean some terms belong to so many concepts. Hence, I decided to randomly throw away some due to the maximum length limitation.

Setup

I finally allowed only 2 to 4 concepts at random for each term. This means, there is still great potential to make the models generalise better 🚀.
I used a total of 279884 training examples and 1260 for testing. Edges -- i.e IsA(concept u, concept v) -- in both sets are disjoint.
Trained for 15K steps with learning rate linear decay during each step. Starting at 0.001
Used RAdam Optimiser with weight_decay =0.01 and batch_size =36.
Source and target max length were both 64.

Multilingual Models

The "conceptor" model is multilingual. English, German and French is supported.
Conceptnet supports many languages, but I just chose those three because those are the ones I speak.

Metrics for flexudy-conceptor-t5-base

Metric	Score
Exact Match	36.67
F1	43.08
Loss smooth	1.214

Unfortunately, we no longer have the metrics for flexudy-conceptor-t5-small. If I recall correctly, base was just slightly better on the test set (ca. 2% F1).

Why not just use the data if you have it structured already?

Conceptnet is very large. Even if you just consider loading a fragment into your RAM, say with only 100K edges, this is still a large graph. Especially, if you think about how you will save the node embeddings efficiently for querying. If you prefer this approach, Milvus can be of great help. You can compute query embeddings and try to find the best match. From there (after matching), you can navigate through the graph at 100% precision.