YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PANCAKE-Qwen3-VL-8B

PANCAKE: Purpose And Context Activate Knowledge Efficiently

A fine-tuned vision-language model for geometric problem-solving, trained with a structured Chain-of-Thought methodology on top of Qwen3-VL-8B.


Overview

PANCAKE introduces a geometry-specific, stage-wise reasoning framework that organizes problem-solving into three structured stages:

  1. Purpose — Explicitly identifies the core objective of the problem in a single sentence, guiding the model to establish a clear solution trajectory.
  2. Description — Extracts and articulates essential visual information from the image (numerical values, geometric properties, coordinates).
  3. Think — Performs logical derivation based on Purpose and Description, executing sequential reasoning steps to arrive at the final answer.

This structured CoT approach is trained via a two-stage pipeline:

  • Stage 1: Supervised Fine-Tuning (SFT) on high-quality PANCAKE-format data generated by Gemini-2.5-Pro
  • Stage 2: Direct Preference Optimization (DPO) using SFT model's incorrect outputs as rejected samples

Performance

Geometry3K Benchmark

Method Accuracy (%)
PANCAKE (DPO) — Ours 70.0
Inter-GPS 57.5
Intern-S1 52.3
Qwen3-VL-8B (Think-only baseline) 53.6

Ablation: PANCAKE Component Contribution (Geometry3K)

Configuration Accuracy (%) Improvement
Baseline (Think only) 53.6 —
+ Description 62.5 +6.9 pp
+ Purpose + Description (PANCAKE) 66.7 +13.1 pp
PANCAKE (DPO) 70.0 +16.4 pp

Structured Reasoning vs. Token Length (Geometry3K)

Method Avg Tokens Accuracy (%)
PANCAKE ~490 66.7
Long-Think (token-matched) ~490 57.5

PANCAKE outperforms a token-comparable unstructured baseline by 9.2 percentage points, confirming that gains stem from structured reasoning design, not mere token count.

UniGeo Generalization

Method Accuracy (%)
PANCAKE (DPO) — Ours 79.0
GOLD 75.2
GAPS 67.8
PANCAKE (SFT) 78.1

Method

PANCAKE Data Format

Each training sample consists of three components generated by Gemini-2.5-Pro:

Purpose: This problem is designed to test the ability to identify ...
Description: The image shows a large triangle ... The vertical side is segmented ...
Think: The goal is to find m∠3 ... subtracting 164° from 180° gives m∠3 = 16°.
Answer: 16

Training Pipeline

  1. Data Synthesis: Gemini-2.5-Pro generates Purpose → Description → Think responses for Geometry3K problems. Samples are iteratively generated until the predicted answer matches the ground truth.
  2. SFT: Qwen3-VL-8B is fine-tuned on PANCAKE data using LoRA on an RTX A6000 GPU.
  3. DPO: Preference pairs are constructed using PANCAKE data as chosen and SFT model's incorrect responses as rejected. DPO is applied to reinforce correct logical pathways.

Base Model

  • Architecture: Qwen3-VL (8B parameters)
  • Fine-tuning method: LoRA (Low-Rank Adaptation)
  • Training hardware: NVIDIA RTX A6000

Datasets

  • Geometry3K: 3,002 geometry problems from American high school math textbooks (grades 9–12). Split: 2,101 train / 300 validation / 601 test.
  • UniGeo (generalization eval): Large-scale high school geometry benchmark; calculation subset used (3,499 train / 745 val / 754 test).

Model Details

Property Value
Base model Qwen3-VL-8B
Architecture Qwen3VLForConditionalGeneration
Parameters ~8B
dtype float16
Hidden size 4096
Attention heads 32
KV heads 8
Hidden layers 36
Max position embeddings 262,144
Vision encoder hidden size 1,152

Citation

@article{pancake2025,
  title={PANCAKE: Purpose And Context Activate Knowledge Efficiently},
  author={Chae-Yun Jung and Yi Seung},
  year={2025},
  institution={St. Johnsbury Academy, Jeju, Korea; Asia Pacific International School, Seoul, Korea}
}

Authors

  • Chae-Yun Jung — St. Johnsbury Academy, Jeju, Korea
  • Yi Seung — Asia Pacific International School, Seoul, Korea
Downloads last month
2
Safetensors
Model size
9B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support