YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
PANCAKE-Qwen3-VL-8B
PANCAKE: Purpose And Context Activate Knowledge Efficiently
A fine-tuned vision-language model for geometric problem-solving, trained with a structured Chain-of-Thought methodology on top of Qwen3-VL-8B.
Overview
PANCAKE introduces a geometry-specific, stage-wise reasoning framework that organizes problem-solving into three structured stages:
- Purpose — Explicitly identifies the core objective of the problem in a single sentence, guiding the model to establish a clear solution trajectory.
- Description — Extracts and articulates essential visual information from the image (numerical values, geometric properties, coordinates).
- Think — Performs logical derivation based on Purpose and Description, executing sequential reasoning steps to arrive at the final answer.
This structured CoT approach is trained via a two-stage pipeline:
- Stage 1: Supervised Fine-Tuning (SFT) on high-quality PANCAKE-format data generated by Gemini-2.5-Pro
- Stage 2: Direct Preference Optimization (DPO) using SFT model's incorrect outputs as rejected samples
Performance
Geometry3K Benchmark
| Method | Accuracy (%) |
|---|---|
| PANCAKE (DPO) — Ours | 70.0 |
| Inter-GPS | 57.5 |
| Intern-S1 | 52.3 |
| Qwen3-VL-8B (Think-only baseline) | 53.6 |
Ablation: PANCAKE Component Contribution (Geometry3K)
| Configuration | Accuracy (%) | Improvement |
|---|---|---|
| Baseline (Think only) | 53.6 | — |
| + Description | 62.5 | +6.9 pp |
| + Purpose + Description (PANCAKE) | 66.7 | +13.1 pp |
| PANCAKE (DPO) | 70.0 | +16.4 pp |
Structured Reasoning vs. Token Length (Geometry3K)
| Method | Avg Tokens | Accuracy (%) |
|---|---|---|
| PANCAKE | ~490 | 66.7 |
| Long-Think (token-matched) | ~490 | 57.5 |
PANCAKE outperforms a token-comparable unstructured baseline by 9.2 percentage points, confirming that gains stem from structured reasoning design, not mere token count.
UniGeo Generalization
| Method | Accuracy (%) |
|---|---|
| PANCAKE (DPO) — Ours | 79.0 |
| GOLD | 75.2 |
| GAPS | 67.8 |
| PANCAKE (SFT) | 78.1 |
Method
PANCAKE Data Format
Each training sample consists of three components generated by Gemini-2.5-Pro:
Purpose: This problem is designed to test the ability to identify ...
Description: The image shows a large triangle ... The vertical side is segmented ...
Think: The goal is to find m∠3 ... subtracting 164° from 180° gives m∠3 = 16°.
Answer: 16
Training Pipeline
- Data Synthesis: Gemini-2.5-Pro generates Purpose → Description → Think responses for Geometry3K problems. Samples are iteratively generated until the predicted answer matches the ground truth.
- SFT: Qwen3-VL-8B is fine-tuned on PANCAKE data using LoRA on an RTX A6000 GPU.
- DPO: Preference pairs are constructed using PANCAKE data as chosen and SFT model's incorrect responses as rejected. DPO is applied to reinforce correct logical pathways.
Base Model
- Architecture: Qwen3-VL (8B parameters)
- Fine-tuning method: LoRA (Low-Rank Adaptation)
- Training hardware: NVIDIA RTX A6000
Datasets
- Geometry3K: 3,002 geometry problems from American high school math textbooks (grades 9–12). Split: 2,101 train / 300 validation / 601 test.
- UniGeo (generalization eval): Large-scale high school geometry benchmark; calculation subset used (3,499 train / 745 val / 754 test).
Model Details
| Property | Value |
|---|---|
| Base model | Qwen3-VL-8B |
| Architecture | Qwen3VLForConditionalGeneration |
| Parameters | ~8B |
| dtype | float16 |
| Hidden size | 4096 |
| Attention heads | 32 |
| KV heads | 8 |
| Hidden layers | 36 |
| Max position embeddings | 262,144 |
| Vision encoder hidden size | 1,152 |
Citation
@article{pancake2025,
title={PANCAKE: Purpose And Context Activate Knowledge Efficiently},
author={Chae-Yun Jung and Yi Seung},
year={2025},
institution={St. Johnsbury Academy, Jeju, Korea; Asia Pacific International School, Seoul, Korea}
}
Authors
- Chae-Yun Jung — St. Johnsbury Academy, Jeju, Korea
- Yi Seung — Asia Pacific International School, Seoul, Korea
- Downloads last month
- 2