FC-CLIP — Open-Vocabulary Panoptic Segmentation

FC-CLIP is an open-vocabulary panoptic segmentation model that pairs a frozen ConvNeXt-Large CLIP backbone with a lightweight Mask2Former decoder. It achieves strong zero-shot performance without requiring separate specialist models for things vs. stuff.

This repository hosts the COCO Panoptic checkpoint uploaded to HuggingFace Hub by Claude, for testing use with the FiftyOne Model Zoo.

Attribution

Paper: "A Simple Framework for Open-Vocabulary Segmentation and Detection"
Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
CVPR 2023 · arxiv 2311.15539

Original code: bytedance/fc-clip — MIT License

Usage

Standalone (trust_remote_code)

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained("neerajaabhyankar/fc-clip", trust_remote_code=True)
model.eval()

# Preprocess: RGB uint8 numpy/PIL → normalised tensor
pixel_values = model.preprocess_image(your_pil_image)  # [1, 3, H, W]

with torch.no_grad():
    results = model(pixel_values)

panoptic_seg, segments_info = results[0]
# panoptic_seg: int32 tensor [H, W]  — pixel → segment id
# segments_info: list[{"id", "category_id", "isthing"}]

Open-vocabulary (custom classes)

results = model(pixel_values, class_names=["cat", "dog", "sky", "grass"])

Architecture

Component	Detail
Backbone	OpenCLIP ConvNeXt-Large (`convnext_large_d_320`, `laion2b_s29b_b131k_ft_soup`), frozen
Pixel decoder	6-layer Multi-Scale Deformable Attention encoder + 1-level FPN
Transformer decoder	5-layer Mask2Former cross-attention decoder, 250 queries
Text classification	VILD 14-template ensemble + geometric in-vocab/out-vocab blending
Classes	133 COCO panoptic (80 things + 53 stuff)

Requirements

torch torchvision transformers open_clip_torch safetensors

No detectron2 required — the model is self-contained.

License

MIT (same as original bytedance/fc-clip)

Downloads last month: 46

Safetensors

Model size

20.7M params

Tensor type

F32

Paper for Voxel51/fc-clip

A Novel Human-Based Meta-Heuristic Algorithm: Dragon Boat Optimization

Paper • 2311.15539 • Published Nov 27, 2023