manitocross's picture
Upload README.md
6ea7ea3 verified

FloorplanVLM Training

Fine-tune Qwen2.5-VL-3B to extract wall, door, and window geometry from floor plan images as structured JSON.

Based on FloorplanVLM (arxiv:2602.06507) — two-stage training:

  1. SFT on CubiCasa5K (5000 real floor plans)
  2. GRPO with geometric reward functions (wall IoU, room IoU, JSON validity)

Quick Start

# Install dependencies
pip install torch torchvision transformers trl peft datasets accelerate shapely Pillow lxml numpy tqdm huggingface_hub

# Optional (faster attention on GPU)
pip install flash-attn

# Login to HuggingFace
huggingface-cli login

# Stage 1: SFT Training
python train_floorplan_vlm.py

# Stage 2: GRPO Training (after SFT completes)
python train_floorplan_grpo.py

What it does

  • Downloads CubiCasa5K dataset (~5GB) from Zenodo automatically
  • Converts SVG floor plan annotations → structured JSON (walls with coordinates, doors, windows, rooms)
  • Trains Qwen2.5-VL-3B with LoRA to predict this JSON from floor plan images
  • Pushes the model to HuggingFace Hub
  • Auto-detects GPU vs CPU (GPU recommended for full training)

Configuration

Edit the top of each script:

Setting Default Description
MAX_SAMPLES None (all) Set to 100 for a quick test run
NUM_EPOCHS 2 Training epochs
PUSH_TO_HUB True Push model to HF Hub
HUB_MODEL_ID manitocross/floorplan-vlm-sft Your model repo

Hardware Requirements

Mode VRAM Time (full dataset)
GPU (A100 80GB) ~20GB ~4-6 hours
GPU (RTX 3090/4090) ~20GB ~8-12 hours
CPU ~14GB RAM ~days (for testing only)

Output JSON Schema

{
  "walls": [
    {
      "id": "wall_1",
      "start": [120, 80],
      "end": [520, 80],
      "thickness": 15,
      "curvature": 0,
      "openings": [
        {"type": "door", "center": 320, "width": 90},
        {"type": "window", "center": 450, "width": 60}
      ]
    }
  ],
  "rooms": [
    {"label": "bedroom", "walls": ["wall_1", "wall_2", "wall_3", "wall_4"]}
  ]
}

GRPO Reward Functions

Stage 2 uses geometric rewards from the FloorplanVLM paper:

  • R_val (0.1 weight): JSON validity + schema compliance
  • R_ext (0.5 weight): External wall boundary IoU (Shapely polygon comparison)
  • R_int (0.4 weight): Room IoU, gated by α when external walls are wrong

References