SceneParser: Hierarchical Scene Parsing for Visual Semantics Understanding

Model release. SceneParser checkpoint for inference and evaluation.

This directory contains the released SceneParser checkpoint for inference and evaluation. The checkpoint is packaged in HuggingFace Transformers format and can be used as MODEL_PATH with the SceneParser evaluation script.

Model Description

SceneParser is a VLM-based hierarchical parser for scene understanding. Given an RGB image and an object- or scene-level query, it predicts a JSON-style hierarchy binding objects, parts, and affordance points into explicit scene -> object -> part -> affordance chains.

Files

config.json
generation_config.json
preprocessor_config.json
tokenizer_config.json
special_tokens_map.json
added_tokens.json
chat_template.json
vocab.json
merges.txt
model.safetensors.index.json
model-00001-of-00002.safetensors
model-00002-of-00002.safetensors

Downloads last month: 18

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support