Diffu — Swedish historical handwriting line generator

Renders a line of text in a target writer's hand. Give it a string + a single-line crop of the target handwriting; it generates a matching handwritten line (trained on Swedish archive material). Generated lines read back at ~0–5% CER through the real Riksarkivet HTR pipeline.

From-scratch diffusion: diffusers SD3 MMDiT backbone + custom 2D-RoPE joint-attention, frozen Qwen-Image VAE (f8), glyph-line content conditioning (GNU-Unifont), DINOv3 pooled style, rectified flow.

Usage

from diffu.pipeline import DiffuPipeline          # pip install git+https://github.com/Borg93/diffu

pipe = DiffuPipeline.from_pretrained("Gabriel/test_sd35")
img = pipe("Göteborgs poliskammare", style="a_line_crop_of_the_target_hand.png", cfg_scale=5.0)
img.save("out.png")

Performance

# one-off lines: default (eager) is fine — ~1 s/line on a modern GPU
pipe = DiffuPipeline.from_pretrained("Gabriel/test_sd35")

# many lines / serving: compile the backbone once (regional torch.compile, dynamic —
# one compile covers variable line widths). One-time cold start, then ~0.4 s/line.
pipe = DiffuPipeline.from_pretrained("Gabriel/test_sd35", compile_backbone=True)

# bf16: ~half the VRAM, same legibility
import torch
pipe = DiffuPipeline.from_pretrained("Gabriel/test_sd35", dtype=torch.bfloat16, compile_backbone=True)

Don't quantize by default. Measured on this model (torchao int8/fp8, CER-gated): legibility is unaffected but inference gets slower — the GEMMs are small and not memory-bound, so weight-quant overhead dominates. bf16 + compile is the recommended serving config.

Requirements

The diffu package (the modeling code this pipeline wraps).
DINOv3 access (gated): trained with facebook/dinov3-vitl16-pretrain-lvd1689m — accept its licence on the Hub (free). Don't substitute DINOv2: different style embedding than the backbone was trained on.
Qwen-Image VAE (Qwen/Qwen-Image, Apache-2.0) — pulled automatically.
CUDA GPU recommended (~6 GB VRAM / line in float32, less in bf16).

Files

model.safetensors (trained weights, single non-strict state_dict) · config.json (arch: line_height=64, glyph_line=True, style_in_context=False, qk_norm='rms_norm').

Downloads last month: 9

Safetensors

Model size

1B params

Tensor type

F32