Harveenchadha/model-entailment

Multimodal entailment

Author: Sayak Paul Date created: 2021/08/08 Last modified: 2021/08/15 Description: Training a multimodal model for predicting entailment.

What is multimodal entailment?

On social media platforms, to audit and moderate content we may want to find answers to the following questions in near real-time:

Does a given piece of information contradict the other? Does a given piece of information imply the other? In NLP, this task is called analyzing textual entailment. However, that's only when the information comes from text content. In practice, it's often the case the information available comes not just from text content, but from a multimodal combination of text, images, audio, video, etc. Multimodal entailment is simply the extension of textual entailment to a variety of new input modalities.