Diffu — Swedish historical handwriting line generator
Renders a line of text in a target writer's hand. Give it a string + a single-line crop of the target handwriting; it generates a matching handwritten line (trained on Swedish archive material). Generated lines read back at ~0–5% CER through the real Riksarkivet HTR pipeline.
From-scratch diffusion: diffusers SD3 MMDiT backbone + custom 2D-RoPE joint-attention, frozen Qwen-Image VAE (f8), glyph-line content conditioning (GNU-Unifont), DINOv3 pooled style, rectified flow.
Usage
from diffu.pipeline import DiffuPipeline # pip install git+https://github.com/Borg93/diffu
pipe = DiffuPipeline.from_pretrained("Gabriel/test_sd35")
img = pipe("Göteborgs poliskammare", style="a_line_crop_of_the_target_hand.png", cfg_scale=5.0)
img.save("out.png")
Performance
# one-off lines: default (eager) is fine — ~1 s/line on a modern GPU
pipe = DiffuPipeline.from_pretrained("Gabriel/test_sd35")
# many lines / serving: compile the backbone once (regional torch.compile, dynamic —
# one compile covers variable line widths). One-time cold start, then ~0.4 s/line.
pipe = DiffuPipeline.from_pretrained("Gabriel/test_sd35", compile_backbone=True)
# bf16: ~half the VRAM, same legibility
import torch
pipe = DiffuPipeline.from_pretrained("Gabriel/test_sd35", dtype=torch.bfloat16, compile_backbone=True)
Don't quantize by default. Measured on this model (torchao int8/fp8, CER-gated): legibility is unaffected but inference gets slower — the GEMMs are small and not memory-bound, so weight-quant overhead dominates. bf16 + compile is the recommended serving config.
Requirements
- The
diffupackage (the modeling code this pipeline wraps). - DINOv3 access (gated): trained with
facebook/dinov3-vitl16-pretrain-lvd1689m— accept its licence on the Hub (free). Don't substitute DINOv2: different style embedding than the backbone was trained on. - Qwen-Image VAE (
Qwen/Qwen-Image, Apache-2.0) — pulled automatically. - CUDA GPU recommended (~6 GB VRAM / line in float32, less in bf16).
Files
model.safetensors (trained weights, single non-strict state_dict) · config.json (arch:
line_height=64, glyph_line=True, style_in_context=False, qk_norm='rms_norm').
- Downloads last month
- 9