Changed Checkpoints?

#263
by Ankit4241 - opened

SAM3 text-prompted segmentation broken after recent checkpoint update
Last week I was testing SAM3's feasibility for a object detection task on drone imagery. Using the Transformers integration with a text prompt of "cars", the model was returning detections (see image 1). This week, running the exact same code on the exact same images returns 0 detections (see image 2). No changes were made to my code or environment.
After investigating, I noticed the model load report shows a key naming mismatch in the text encoder:

The checkpoint stores text encoder weights under detector_model.text_encoder.*
The Sam3Model architecture expects them at text_encoder.text_model.*

This means the text encoder is loading with randomly initialized weights instead of the pretrained ones, which explains why the model is no longer responding to text prompts at all. I suspect the checkpoint was updated recently and the key names changed.

To reproduce:

from transformers import Sam3Processor, Sam3Model
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
model = Sam3Model.from_pretrained("facebook/sam3").to(device)
processor = Sam3Processor.from_pretrained("facebook/sam3")

from PIL import Image
image = Image.open("your_image.jpg").convert("RGB")
inputs = processor(images=image, text="cars", return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)

results = processor.post_process_instance_segmentation(
    outputs, target_sizes=[image.size[::-1]]
)[0]

print(f"Found {len(results['masks'])} instance(s)")

The load report on model initialization shows all 24 text encoder layers as UNEXPECTED (from checkpoint) and MISSING (in model), confirming the text encoder is not loading correctly.
Note: the image 1 result I attached from last week also now appears to have been wrong in hindsight β€” it was detecting construction materials as "cars", suggesting the text encoder may have already been partially broken then, just failing differently. The current behavior (0 detections on images with clearly visible cars) is a complete failure.
Has the checkpoint been updated recently? Is there a pinnable revision that is known to work?

image

image

image

image

Sign up or log in to comment