Sumi-7B

Sumi is a native uniform diffusion language model trained from scratch, so it runs full bidirectional attention and denoises a canvas of randomly corrupted tokens. We provide Sumi in a custom model class, therefore you need to set `trust_remote_code=True` to use it in transformers.

We recommend transformers==5.8.1.

For more details, please refer to our project page and technical report.

Quickstart

import torch
from transformers import AutoModelForMaskGeneration, AutoTokenizer

model_id = "tohoku-nlp/sumi-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForMaskGeneration.from_pretrained(
    model_id, trust_remote_code=True, dtype=torch.bfloat16
).to("cuda").eval()

prompt = "Our journey into exploring diffusion language model begins,"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=256,       # content budget; the EOS/BOS delimiter is anchored here
    num_denoising_steps=64,   # refinement iterations — the main quality/compute dial
    sampler="ancestral",      # "ancestral" (default) or "adaptive" (sharper, for code/math)
    temperature=0.7,
)
print(tokenizer.decode(out.sequences[0], skip_special_tokens=True))

generate() returns the trimmed completion in out.sequences and the full untrimmed canvas in out.canvas.

Citation

@misc{ye2026sumi,
      title={Sumi: Open Uniform Diffusion Language Model from Scratch}, 
      author={Mengyu Ye and Keito Kudo and Wataru Ikeda and Ryosuke Matsuda and Keisuke Sakaguchi and Jun Suzuki},
      year={2026},
      eprint={2606.19005},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.19005}, 
}

Downloads last month: 45

Safetensors

Model size

8B params

Tensor type

BF16

Collection including tohoku-nlp/sumi-7b

Sumi

Collection

1 item • Updated 1 day ago

Paper for tohoku-nlp/sumi-7b

Sumi: Open Uniform Diffusion Language Model from Scratch

Paper • 2606.19005 • Published 2 days ago • 8