You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SGJM — Speculative Graph JEPA Model (family)

This repository is the family hub for SGJM. It carries the model card, license, and an index of the released variants. Each trained variant lives in its own repository under the CoastalDigitalResearch organization so that from_pretrained(...) resolves a single model per repo.

Status: architecture preview. Variant repositories are reserved and gated while baseline training on open data completes. No trained weights are published in this hub. Source code lives on GitHub, not here — see Source & reproduction below.

What SGJM is

A research prototype that combines speculative decoding with a Joint Embedding Predictive Architecture (JEPA) to generate, score, and verify speculative token branches in parallel — within a single trainable system. A shared backbone feeds three lightweight heads:

Drafter — projects the backbone hidden state into a smaller space and emits k speculative token blocks in one forward pass.
JEPA Judge — predicts the backbone's future latent at the end of a draft block (MSE against the real future state, stop-gradient) and scores branches by latent confidence rather than token probability alone.
Verifier — a binary accept/reject classifier over concatenated parent/child hidden states, trained with contrastive pairs.

The backbone is byte-level (vocab = 256) and configurable as a pure transformer or as a hybrid Mamba-2 / attention stack (attn_every_n: every Nth layer is full-attention, the rest are Mamba-2 SSD blocks).

Variants

Repository	Backbone	d_model	Layers	Params (approx)	Status
`SGJM-25M`	transformer	384	10	~25M	reserved (gated)
`SGJM-250M`	transformer	1024	14	~250M	reserved (gated)
`SGJM-25M-hybrid`	Mamba-2 + attention	384	10	~25M	reserved (gated)
`SGJM-250M-hybrid`	Mamba-2 + attention	1024	14	~250M	reserved (gated)
`SGJM-100M`	transformer	768	9	~100M	planned
`SGJM-1B`	transformer	—	—	~1B	planned

All variants share the same four-component architecture, byte-level vocabulary, and four-term training objective (token + drafter + JEPA + verifier). The hybrid variants differ only in the backbone (Mamba-2 SSD blocks with periodic attention).

Intended use

Research into speculative decoding, JEPA-style latent prediction, and hybrid SSM/attention backbones. These are small, byte-level, experimental models — not instruction-tuned assistants and not intended for production text generation.

Source & reproduction

The training/eval/research code is not distributed through this model repository. It is published as source on GitHub: https://github.com/AdamPippert/SGJM. Custom architecture code needed for inference (configuration_sgjm.py / modeling_sgjm.py) ships inside each variant repository for trust_remote_code=True.

License

Released under the MIT License. See LICENSE and NOTICE.

Downloads last month: -; Downloads are not tracked for this model. How to track