SGJM β Speculative Graph JEPA Model (family)
This repository is the family hub for SGJM. It carries the model card, license,
and an index of the released variants. Each trained variant lives in its own
repository under the CoastalDigitalResearch
organization so that from_pretrained(...) resolves a single model per repo.
Status: architecture preview. Variant repositories are reserved and gated while baseline training on open data completes. No trained weights are published in this hub. Source code lives on GitHub, not here β see Source & reproduction below.
What SGJM is
A research prototype that combines speculative decoding with a Joint Embedding Predictive Architecture (JEPA) to generate, score, and verify speculative token branches in parallel β within a single trainable system. A shared backbone feeds three lightweight heads:
- Drafter β projects the backbone hidden state into a smaller space and emits
kspeculative token blocks in one forward pass. - JEPA Judge β predicts the backbone's future latent at the end of a draft block (MSE against the real future state, stop-gradient) and scores branches by latent confidence rather than token probability alone.
- Verifier β a binary accept/reject classifier over concatenated parent/child hidden states, trained with contrastive pairs.
The backbone is byte-level (vocab = 256) and configurable as a pure transformer or
as a hybrid Mamba-2 / attention stack (attn_every_n: every Nth layer is
full-attention, the rest are Mamba-2 SSD blocks).
Variants
| Repository | Backbone | d_model | Layers | Params (approx) | Status |
|---|---|---|---|---|---|
SGJM-25M |
transformer | 384 | 10 | ~25M | reserved (gated) |
SGJM-250M |
transformer | 1024 | 14 | ~250M | reserved (gated) |
SGJM-25M-hybrid |
Mamba-2 + attention | 384 | 10 | ~25M | reserved (gated) |
SGJM-250M-hybrid |
Mamba-2 + attention | 1024 | 14 | ~250M | reserved (gated) |
SGJM-100M |
transformer | 768 | 9 | ~100M | planned |
SGJM-1B |
transformer | β | β | ~1B | planned |
All variants share the same four-component architecture, byte-level vocabulary, and four-term training objective (token + drafter + JEPA + verifier). The hybrid variants differ only in the backbone (Mamba-2 SSD blocks with periodic attention).
Intended use
Research into speculative decoding, JEPA-style latent prediction, and hybrid SSM/attention backbones. These are small, byte-level, experimental models β not instruction-tuned assistants and not intended for production text generation.
Source & reproduction
The training/eval/research code is not distributed through this model
repository. It is published as source on GitHub:
https://github.com/AdamPippert/SGJM. Custom architecture code needed for
inference (configuration_sgjm.py / modeling_sgjm.py) ships inside each variant
repository for trust_remote_code=True.
License
Released under the MIT License. See LICENSE and NOTICE.