Enabling Autoregressive Models to Fill In Masked Tokens
Paper โข 2502.06901 โข Published
How to use dmisrael/maria-olmo-7b with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="dmisrael/maria-olmo-7b", trust_remote_code=True) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("dmisrael/maria-olmo-7b", trust_remote_code=True, dtype="auto")MARIA ("Masked Autoregressive Infilling Adapter") is a small trained head that lets a frozen autoregressive (AR) language model fill in masked tokens. This checkpoint bundles:
| Component | Source | Frozen? |
|---|---|---|
| AR backbone | allenai/OLMo-7B-0724-hf |
โ |
| MLM backbone | answerdotai/ModernBERT-large |
โ |
| Fusion head | trained (this repo) | โ |
Only the fusion head (โ257M params) was trained. Both backbones are frozen and shipped here unchanged.
Paper: Enabling Autoregressive Models to Fill In Masked Tokens (Israel et al., 2025) Code & training scripts: https://github.com/danielmisrael/maria
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
model = AutoModelForMaskedLM.from_pretrained(
"dmisrael/maria-olmo-7b", trust_remote_code=True, torch_dtype=torch.bfloat16
).to("cuda").eval()
tok = AutoTokenizer.from_pretrained("dmisrael/maria-olmo-7b")
text = f"The capital of France is {tok.mask_token}."
ids = tok(text, return_tensors="pt").input_ids.cuda()
out = model.infill(ids, mask_token_id=tok.mask_token_id)
print(tok.decode(out[0], skip_special_tokens=True))
# -> "The capital of France is Paris."
The model class exposes two inference methods on top of the standard HF interface (see the GitHub README for full docs):
model.infill(input_ids, mask_token_id, greedy=True) โ fills every
position where input_ids == mask_token_id left-to-right.model.compute_nll(input_ids, labels, reduction='mean') โ returns
negative log-likelihood at positions where labels != -100.@article{israel2025maria,
title = {Enabling Autoregressive Models to Fill In Masked Tokens},
author = {Israel, Daniel and Grover, Aditya and Van den Broeck, Guy},
journal= {arXiv preprint arXiv:2502.06901},
year = {2025}
}
Apache 2.0. Inherits from the underlying allenai/OLMo-7B-0724-hf and answerdotai/ModernBERT-large backbones.
Base model
allenai/OLMo-7B-0724-hf