jpark677/maoam_data
Viewer • Updated • 15.9k • 2.18k • 1
MAOAM (Mask Any Object And Material) is a unified selection framework that enables precise object- and material-level selection across both text- and click-based interactions. It leverages a Vision-Language Model (VLM) with a segmentation head to produce pixel-accurate masks from user prompts.
Project Page | Paper (ArXiv) | Code
The official implementation supports two backbones:
For detailed instructions on setup, downloading weights, and running the Gradio demo, please refer to the official GitHub repository.
@inproceedings{park2026maoam,
title = {MAOAM: Unified Object and Material Selection with Vision-Language Models},
author = {Park, Jaden and Deschaintre, Valentin and Kuen, Jason and
Liu, Kangning and Georgiev, Iliyan and Singh, Krishna Kumar and
Lee, Yong Jae and Fischer, Michael},
booktitle = {ACM SIGGRAPH 2026 Conference Papers},
year = {2026},
publisher = {ACM},
doi = {10.1145/3799902.3811186},
}