You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Dose-Response C1 (8M-0%) with SafeCLIP: All unsafe removed

Text-encoder ablation using SafeCLIP (77 tokens) in place of the default T5-Gemma-2B (256 tokens).

Condition

Label C1 (8M-0%)
Description All unsafe images removed (N kept approximately fixed).
Training set size N 7.94M
Unsafe fraction p 0%
Unsafe count U 0

Architecture

Class PRX (rectified-flow DiT)
Hidden size 1792
Depth 16
Heads 28
MLP ratio 3.5
Patch size 32 px
Bottleneck 256
Resolution 512×512

Text encoder

Model aimagelab/safeclip_vit-l_14
Max prompt tokens 77
Dtype bfloat16

Diffusion scheduler

Type x-prediction flow matching
Train timesteps 1000
Timestep shift 3.0

Training

Iterations 100,000
Samples seen ~25.60M
Global batch size 256
Microbatch (per GPU) 32
Hardware 8× NVIDIA H200
Precision bfloat16 (amp_bf16)
Optimizer (transformer blocks) Muon (lr=1e-4, momentum=0.95, nesterov, ns_steps=5, weight_decay=0)
Optimizer (other params) AdamW (lr=1e-4, β=(0.9, 0.95), eps=1e-8, weight_decay=0)
LR schedule 1,000-step linear warmup, constant after
EMA decay 0.999, started at step 0
Random seed 42
Trainer Composer + FSDP

Training data sources

The training set combines three image datasets, with per-condition filtering/oversampling:

Files

  • denoiser.pt — Consolidated EMA-denoiser checkpoint
  • config.yaml — Full training configuration

Framework

Trained with the PRX framework (Composer + FSDP). The full config.yaml is included for reproducibility.

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train anonym371/dose-response-c1-safeclip

Collection including anonym371/dose-response-c1-safeclip