FloorplanVLM Training

Fine-tune Qwen2.5-VL-3B to extract wall, door, and window geometry from floor plan images as structured JSON.

Based on FloorplanVLM (arxiv:2602.06507) — two-stage training:

SFT on CubiCasa5K (5000 real floor plans)
GRPO with geometric reward functions (wall IoU, room IoU, JSON validity)

Quick Start

# Install dependencies
pip install torch torchvision transformers trl peft datasets accelerate shapely Pillow lxml numpy tqdm huggingface_hub

# Optional (faster attention on GPU)
pip install flash-attn

# Login to HuggingFace
huggingface-cli login

# Stage 1: SFT Training
python train_floorplan_vlm.py

# Stage 2: GRPO Training (after SFT completes)
python train_floorplan_grpo.py

What it does

Downloads CubiCasa5K dataset (~5GB) from Zenodo automatically
Converts SVG floor plan annotations → structured JSON (walls with coordinates, doors, windows, rooms)
Trains Qwen2.5-VL-3B with LoRA to predict this JSON from floor plan images
Pushes the model to HuggingFace Hub
Auto-detects GPU vs CPU (GPU recommended for full training)

Configuration

Edit the top of each script:

Setting	Default	Description
`MAX_SAMPLES`	`None` (all)	Set to `100` for a quick test run
`NUM_EPOCHS`	`2`	Training epochs
`PUSH_TO_HUB`	`True`	Push model to HF Hub
`HUB_MODEL_ID`	`manitocross/floorplan-vlm-sft`	Your model repo

Hardware Requirements

Mode	VRAM	Time (full dataset)
GPU (A100 80GB)	~20GB	~4-6 hours
GPU (RTX 3090/4090)	~20GB	~8-12 hours
CPU	~14GB RAM	~days (for testing only)

Output JSON Schema

{
  "walls": [
    {
      "id": "wall_1",
      "start": [120, 80],
      "end": [520, 80],
      "thickness": 15,
      "curvature": 0,
      "openings": [
        {"type": "door", "center": 320, "width": 90},
        {"type": "window", "center": 450, "width": 60}
      ]
    }
  ],
  "rooms": [
    {"label": "bedroom", "walls": ["wall_1", "wall_2", "wall_3", "wall_4"]}
  ]
}

GRPO Reward Functions

Stage 2 uses geometric rewards from the FloorplanVLM paper:

R_val (0.1 weight): JSON validity + schema compliance
R_ext (0.5 weight): External wall boundary IoU (Shapely polygon comparison)
R_int (0.4 weight): Room IoU, gated by α when external walls are wrong

manitocross
/

floorplan-vlm-training