@Jaward on Hugging Face: "Speaking of the missing piece in today’s generative AI: Reasoning (or more…"

Post

Speaking of the missing piece in today’s generative AI: Reasoning (or more appropriately, the proper use of Common-Sense)

Human Intelligence is hinged on the brain’s ability to learn vast amounts of background knowledge about the world just by passively observing it. Such common-sense information is believed to be the enabler of intelligent behavior (planning, reasoning and grounding).

Unusual question: how do we actually learn common-sense knowledge?

Unusual opinion: I personally believe we haven’t fully understand how the brain learns, thus cannot get machines to mimic how we learn.

Well so far AI godfather Prof. Yann have quite a promising vision of how machines can learn world models like we humans do. Excited to share his vision after giving I-JEPA a read.

I-JEPA (Image-based Joint-Embedding Predictive Architecture) is a novel approach to self-supervised learning from images. This method focuses on learning semantic image features without relying on pre-set rules based on manual data changes. Instead, I-JEPA predicts the representations of multiple target blocks within a single image using a single context block.

The I-JEPA architecture consists of a context encoder, a target encoder, and a predictor. The context encoder extracts context features from a context block, while the target encoder extracts target features from the target blocks. The predictor then uses the context features to predict the target features.

One of the main advantages of I-JEPA is that it is non-generative, meaning it does not rely on pre-set rules based on manual data changes. It also uses multi-block masking, which allows it to learn semantic representations more effectively.

This is very promising, hopefully we can look back this one day and amuse at how we got it right.

Paper: https://arxiv.org/abs/2301.08243
Code: https://github.com/facebookresearch/ijepa

Join the conversation