dinovol β€” v2 backbone, patch size 8 (paris4 run, step 352500)

A 3D, DINOv2/DINOv3-style self-supervised representation model trained on volumetric micro-CT scans of carbonized Herculaneum scrolls. This repository publishes the EMA teacher backbone from the paris4 training run at training step 352500.

Training code: dinovol. This is a representation/feature-extraction model β€” there is no task-specific head; you take its dense patch embeddings and use them downstream.

Model details

Backbone family DINOv2/EVA ViT, 3D, with 3D RoPE (DINOv3-style)
model_type v2
Embedding dim 864
Depth 24 blocks
Attention heads 16
MLP SwiGLU, mlp_ratio 8/3
Register tokens 4
Patch size 8 Γ— 8 Γ— 8
Global crop size (train) 128 Γ— 128 Γ— 128
Input channels 1 (grayscale CT)
Positional encoding RoPE mixed (base 100, normalize_coords=separate, rescale=2.0, shift=0.05, jitter=1.05); no absolute pos-emb
Backbone parameters 215.9 M
Training step 352500
W&B run model_v2__shift005_jitter105__r342500__paris4__20260416

Pretraining objective: DINO + iBOT + KoLeo, with late dense-feature refinement via Gram anchoring (DINOv3-style), trained with AMP.

Training data

Self-supervised on 11 open-data Herculaneum volumes (scale 0) from s3://vesuvius-challenge-open-data/:

PHerc0009B, PHerc0500P2, PHerc0814, PHerc1299, PHerc0343P, PHerc0332, PHerc0139, PHercMAN5, PHerc1451, PHercMANB, PHercParis4.

Files

File Size Use
dinovol_v2_ps8_paris4_step352500_teacher_backbone.pt ~0.86 GB Recommended for inference. Slim file: {step, config, teacher backbone weights}. Loads directly with the repo's loader.
checkpoint_step_352500_paris4.pt ~5.0 GB Full training checkpoint (student, EMA teacher, optimizer, scaler, loss buffers, Gram teacher, RNG state). Use to resume training or for full reproducibility.
config.json β€” The model config block, for quick inspection.

Both .pt files embed the full training config, so the architecture is rebuilt automatically β€” no separate config needed at load time.

Usage

Clone the training repo and use its loader (the config travels inside the weights):

import torch
from huggingface_hub import hf_hub_download
from dinovol_2.eval import embedding_utils as eu

path = hf_hub_download(
    "scrollprize/dinovol_v2_ps8_with_paris4_352500",
    "dinovol_v2_ps8_paris4_step352500_teacher_backbone.pt",
)

loaded = eu.load_backbone_from_checkpoint(path, device="cuda")
backbone = loaded.backbone.eval()

# dense patch embeddings for a 1-channel volume, dims multiples of patch_size (8)
vol = torch.randn(1, 1, 128, 128, 128, device="cuda")
with torch.no_grad():
    out = backbone.forward_features(vol, masks=None, view_kind="global")
patch_tokens = out["x_norm_patchtokens"]   # (B, num_patches, 864)

For a normalized, windowed embedding grid over a real OME-Zarr volume, use eu.compute_patch_embedding_grid(...), which applies the checkpoint's normalization scheme and tiles the volume. The repo also ships a napari inspector at dinovol_2/eval/napari_visualizer.py.

License

MIT. See LICENSE.

Caveats

  • Trained on single-channel Herculaneum micro-CT; behavior on other modalities is untested.
  • Pretraining only β€” no finetuning / segmentation head is included.
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collections including scrollprize/dinovol_v2_ps8_with_paris4_352500