bug: model output logits have detached gradient

#4
by andersonbcdefg - opened

hi all! heroic job getting this model out and integrated into HF. i'm already trying to play with it and running into issues. here is a simple way to reproduce:

model_id = "answerdotai/ModernBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id).to("cuda")
# Create a simple input
inputs = {
"input_ids": torch.randint(0, 1000, (1, 10)).cuda(),
"attention_mask": torch.ones(1, 10).cuda()
}

# Set to train mode and check all parameters
model.train()
for name, param in model.named_parameters():
    print(f"{name}: requires_grad = {param.requires_grad}")

# Do forward pass
outputs = model(**inputs)
print("\nOutput logits requires_grad:", outputs.logits.requires_grad)
print("Output logits grad_fn:", outputs.logits.grad_fn)
return

When I do this, the output is:
Output logits requires_grad: False
Output logits grad_fn: None

Despite explicitly setting all the parameters to requires_grad = True! And when printing all the params, they all are correctly set to requires_grad = True.

Just to sanity check, I ran the same code but set model_id = "bert-base-uncased", and got:
Output logits requires_grad: True
Output logits grad_fn: <ViewBackward0 object at 0x7f0ca6abf370>

So it's def a ModernBERT specific problem!
Here's the image I'm using (Modal Labs H100):

image = Image.from_registry(
"nvcr.io/nvidia/cuda:12.6.3-cudnn-devel-ubuntu20.04",
add_python="3.10"
).pip_install(
"torch>=2.5", "ninja", "packaging"
).pip_install("wheel").apt_install("git").run_commands([
"pip install flash_attn --no-build-isolation"
]).pip_install(
"transformers@git+https://github.com/huggingface/transformers.git",
"datasets"
).run_function(download_model).run_function(download_data)

Sign up or log in to comment