Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
VictorSanh 
posted an update Mar 12
Post
Can you beat an AI at Raven puzzles?

HuggingFaceM4/ai_raven

The most powerful vision+language AI systems like Gemini or GPT4V struggle with this problem when used out-of-the-box ( How Far Are We from Intelligent Visual Deductive Reasoning? (2403.04732)).

But when properly trained, a small ~8B model can be very accurate at these IQ tests, solely based on visual inputs!

Raven's Progressive Matrices are visual intelligence tests invented in the 1930s designed to measure abstract reasoning and problem-solving ability. The test consists of a series of matrices or patterns with one part missing. The task for the test-taker is to identify the missing piece from a set of options.

Such puzzles can be procedurally generated at scale. HuggingFaceM4/RAVEN is one example. The complexity of the puzzles is then controlled by the complexity of the generation procedure.

We fine-tuned an early checkpoint of our upcoming vision-and-language model idefics2 on that dataset. The resulting checkpoint yields ~91% accuracy! No chain of thoughts, no pre-processing of the image, no additional inputs or metadata, just the RAVEN problem fed to the model as a standalone image (and a short instruction to the model “Which figure should complete the logical sequence?”), with the training objective being the standard cross-entropy.

Just another evidence that in a lot of cases, for a given well-scoped problem, you will be better off paying to collect & annotate data, and fine-tune a model on that data (i.e. build your own AI) than wastefully trying to solve that problem with a gigantic general-purpose model you call through a paid API!

This is nice