mimo-1.0 / MASK_FIX_SUMMARY.md
minhho's picture
Fix mask dimension mismatch error with bounds checking
72260ee

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

Fix: Mask Dimension Mismatch Error

Problem

Error during video generation:

ValueError: could not broadcast input array from shape (1012,1024) into shape (1000,1024)

Root Cause

The mask dimensions (1012Γ—1024) exceeded the canvas bounds (1000Γ—1024) at line 1081:

mask_full[h_min:h_min + mask.shape[0], w_min:w_min + mask.shape[1]] = mask

This happened when:

  1. Template bounding box (bbox) calculation positioned the mask near canvas edges
  2. Mask size + position exceeded canvas dimensions
  3. NumPy couldn't broadcast larger array into smaller space

Solution

Added bounds checking and clipping before mask assignment:

# Before (BROKEN):
mask_full[h_min:h_min + mask.shape[0], w_min:w_min + mask.shape[1]] = mask

# After (FIXED):
# Clip mask to fit within canvas bounds
canvas_h, canvas_w = mask_full.shape
mask_h, mask_w = mask.shape

# Calculate actual region that fits
h_end = min(h_min + mask_h, canvas_h)
w_end = min(w_min + mask_w, canvas_w)

# Clip mask if it exceeds bounds
actual_h = h_end - h_min
actual_w = w_end - w_min

mask_full[h_min:h_end, w_min:w_end] = mask[:actual_h, :actual_w]

How It Works

Example: Mask Exceeds Bottom/Right Bounds

Canvas: 1000Γ—1024 (hΓ—w)
Mask: 1012Γ—1024
Position: h_min=0, w_min=0

Before Fix:
  Tries to assign mask[0:1012, 0:1024] β†’ canvas[0:1012, 0:1024]
  ERROR: canvas only has 1000 rows!

After Fix:
  h_end = min(0 + 1012, 1000) = 1000
  w_end = min(0 + 1024, 1024) = 1024
  actual_h = 1000 - 0 = 1000
  actual_w = 1024 - 0 = 1024

  Assigns mask[0:1000, 0:1024] β†’ canvas[0:1000, 0:1024]
  βœ… SUCCESS: Clips bottom 12 rows of mask to fit

Example: Mask Exceeds All Bounds

Canvas: 1000Γ—1024
Mask: 520Γ—530
Position: h_min=500, w_min=500

Before Fix:
  Tries: canvas[500:1020, 500:1030] = mask
  ERROR: Canvas ends at row 1000, column 1024!

After Fix:
  h_end = min(500 + 520, 1000) = 1000
  w_end = min(500 + 530, 1024) = 1024
  actual_h = 1000 - 500 = 500
  actual_w = 1024 - 500 = 524

  Assigns: canvas[500:1000, 500:1024] = mask[0:500, 0:524]
  βœ… SUCCESS: Clips mask to fit remaining canvas space

Changed Files

  • app_hf_spaces.py (line ~1077-1094)

Testing

This fix handles:

  • βœ… Masks larger than canvas
  • βœ… Masks positioned near edges
  • βœ… Masks that exceed multiple bounds
  • βœ… Normal cases (no clipping needed)

Impact

  • βœ… Prevents crash during video generation
  • βœ… Gracefully clips oversized masks
  • βœ… No visual quality loss (excess mask area is outside canvas anyway)
  • βœ… Works with all template sizes and aspect ratios

Deploy

# Commit the fix
git add app_hf_spaces.py
git commit -m "Fix mask dimension mismatch error with bounds checking"

# Push to HuggingFace Space
git push hf deploy-clean-v3:main

# Wait for Space to rebuild (~2 minutes)

Expected Result

Video generation should complete successfully without the broadcast error, even when masks extend beyond canvas bounds.