Insertion Based Sequence Generation with Learnable Order Dynamics
Paper • 2602.18695 • Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Variable-length discrete diffusion model for bracket SAFE molecule generation.
Paper: https://arxiv.org/pdf/2602.18695
| Hyperparameter | Value |
|---|---|
| Learning rate | 0.0001 |
| Global batch size | 2048 |
| Block size | 256 |
| Training steps | 50000 |
| Weight decay | 0.01 |
| Dataset | Bracket SAFE (datamol-io/safe-gpt, ~1.17B molecules) |
| Checkpoint | EMA weights at step 50000 |
1024 sampling steps, 1000 molecules per run, mean ± std over 5 seeds (from paper Table 1 / appendix).
| conf. | p | Validity (%) | Diversity | Uniqueness (%) | Quality (%) |
|---|---|---|---|---|---|
| yes | 1.0 | 99.0 ± 0.1 | 0.900 ± 0.000 | 99.700 ± 0.000 | 55.800 ± 0.700 |
| yes | 0.5 | 99.700 ± 0.100 | 0.830 ± 0.000 | 93.400 ± 0.300 | 72.200 ± 0.600 |
| no | 1.0 | 99.0 ± 0.1 | 0.900 ± 0.000 | 99.800 ± 0.100 | 55.100 ± 0.700 |
| no | 0.5 | 99.700 ± 0.100 | 0.850 ± 0.000 | 99.600 ± 0.200 | 79.400 ± 0.600 |
Means over 5 runs (from paper Table 2). Tasks: LD (linker design), ME (motif extension), SD (scaffold decoration), SG (superstructure generation).
| Task | Validity (%) | Diversity | Uniqueness (%) | Quality (%) |
|---|---|---|---|---|
| Linker design | 99.6 | 0.576 | 64.4 | 51.7 |
| Motif extension | 99.9 | 0.608 | 79.2 | 53.6 |
| Scaffold decoration | 99.8 | 0.601 | 82.6 | 40.5 |
| Superstructure generation | 100.0 | 0.593 | 72.6 | 37.0 |
See the https://github.com/dhruvdcoder/LoFlexMDM release repository for training and evaluation instructions.