answerdotai/ModernBERT-base · bug: model output logits have detached gradient

hi all! heroic job getting this model out and integrated into HF. i'm already trying to play with it and running into issues. here is a simple way to reproduce:

model_id = "answerdotai/ModernBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id).to("cuda")
# Create a simple input
inputs = {
"input_ids": torch.randint(0, 1000, (1, 10)).cuda(),
"attention_mask": torch.ones(1, 10).cuda()
}

# Set to train mode and check all parameters
model.train()
for name, param in model.named_parameters():
    print(f"{name}: requires_grad = {param.requires_grad}")

# Do forward pass
outputs = model(**inputs)
print("\nOutput logits requires_grad:", outputs.logits.requires_grad)
print("Output logits grad_fn:", outputs.logits.grad_fn)
return

When I do this, the output is:
Output logits requires_grad: False
Output logits grad_fn: None

Despite explicitly setting all the parameters to requires_grad = True! And when printing all the params, they all are correctly set to requires_grad = True.

Just to sanity check, I ran the same code but set model_id = "bert-base-uncased", and got:
Output logits requires_grad: True
Output logits grad_fn: <ViewBackward0 object at 0x7f0ca6abf370>

So it's def a ModernBERT specific problem!
Here's the image I'm using (Modal Labs H100):

image = Image.from_registry(
"nvcr.io/nvidia/cuda:12.6.3-cudnn-devel-ubuntu20.04",
add_python="3.10"
).pip_install(
"torch>=2.5", "ninja", "packaging"
).pip_install("wheel").apt_install("git").run_commands([
"pip install flash_attn --no-build-isolation"
]).pip_install(
"transformers@git+https://github.com/huggingface/transformers.git",
"datasets"
).run_function(download_model).run_function(download_data)