You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SGJM β€” Speculative Graph JEPA Model (family)

This repository is the family hub for SGJM. It carries the model card, license, and an index of the released variants. Each trained variant lives in its own repository under the CoastalDigitalResearch organization so that from_pretrained(...) resolves a single model per repo.

Status: architecture preview. Variant repositories are reserved and gated while baseline training on open data completes. No trained weights are published in this hub. Source code lives on GitHub, not here β€” see Source & reproduction below.

What SGJM is

A research prototype that combines speculative decoding with a Joint Embedding Predictive Architecture (JEPA) to generate, score, and verify speculative token branches in parallel β€” within a single trainable system. A shared backbone feeds three lightweight heads:

  • Drafter β€” projects the backbone hidden state into a smaller space and emits k speculative token blocks in one forward pass.
  • JEPA Judge β€” predicts the backbone's future latent at the end of a draft block (MSE against the real future state, stop-gradient) and scores branches by latent confidence rather than token probability alone.
  • Verifier β€” a binary accept/reject classifier over concatenated parent/child hidden states, trained with contrastive pairs.

The backbone is byte-level (vocab = 256) and configurable as a pure transformer or as a hybrid Mamba-2 / attention stack (attn_every_n: every Nth layer is full-attention, the rest are Mamba-2 SSD blocks).

Variants

Repository Backbone d_model Layers Params (approx) Status
SGJM-25M transformer 384 10 ~25M reserved (gated)
SGJM-250M transformer 1024 14 ~250M reserved (gated)
SGJM-25M-hybrid Mamba-2 + attention 384 10 ~25M reserved (gated)
SGJM-250M-hybrid Mamba-2 + attention 1024 14 ~250M reserved (gated)
SGJM-100M transformer 768 9 ~100M planned
SGJM-1B transformer β€” β€” ~1B planned

All variants share the same four-component architecture, byte-level vocabulary, and four-term training objective (token + drafter + JEPA + verifier). The hybrid variants differ only in the backbone (Mamba-2 SSD blocks with periodic attention).

Intended use

Research into speculative decoding, JEPA-style latent prediction, and hybrid SSM/attention backbones. These are small, byte-level, experimental models β€” not instruction-tuned assistants and not intended for production text generation.

Source & reproduction

The training/eval/research code is not distributed through this model repository. It is published as source on GitHub: https://github.com/AdamPippert/SGJM. Custom architecture code needed for inference (configuration_sgjm.py / modeling_sgjm.py) ships inside each variant repository for trust_remote_code=True.

License

Released under the MIT License. See LICENSE and NOTICE.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support