DiffusionGemma-26B-A4B-it-Infinite-Context

NZFC-GRAM runtime overlay for external evidence context around google/diffusiongemma-26B-A4B-it.

Marketing title: Infinite-Context
Technical boundary: external evidence context, not native unlimited model context.

This repository is a runtime and evidence-governance overlay. It does not include or redistribute Google model weights.

The goal is to combine DiffusionGemma's large native working context with NZFC-GRAM's external memory, large-document indexing, scoped retrieval, tombstone filtering, malicious-memory redaction, exact-slot recall, and bounded evidence packs.


TL;DR

DiffusionGemma native context
+ NZFC-GRAM external evidence memory
+ large-document indexing
+ scoped retrieval
+ tombstone guard
+ bounded evidence packs
=
Infinite-Context as an external evidence runtime, not native unlimited context

Runtime-only validation is already passing from a fresh Hugging Face download.

{
  "runtime_only": true,
  "model_loaded": false,
  "repo_root_runtime_exists": true,
  "repo_root_meta_exists": true,
  "repo_root_memory_tensors_exists": true,
  "exact_slot_passed": true,
  "large_document_passed": true,
  "large_document_query_count": 2,
  "tombstone_guard_passed": true,
  "technical_boundary": "external evidence context, not native unlimited model context"
}

Base model

Base model:

google/diffusiongemma-26B-A4B-it

DiffusionGemma 26B A4B-IT is the external base model used by this overlay. According to the base model card, DiffusionGemma supports long context up to 256K tokens and multimodal input capabilities. This repository does not modify or redistribute the base model weights.


What NZFC-GRAM adds

Layer Purpose Status in this repo
nzfc_gram_runtime/ NZFC-GRAM runtime package Included
runtime/ Hybrid exact-recall runtime assets Included
meta/ Static archive metadata required by runtime Included
memory_tensors/ Static archive tensors / manifest Included
SQLite local memory User/project/session long-term memory Runtime-supported
Exact slot mapper Deterministic recall for short key-value facts Runtime-supported
Tombstone guard Filters deleted MEM_* records from retrieval Runtime-supported
Large-document profile Chunking + SQLite FTS5 retrieval Runtime-supported
Legal-document profile Article-style chunking and retrieval Runtime-supported
DiffusionGemma adapter Optional base-model generation adapter Included
DiffusionGemma weights Base model weights External, not included

Architecture

User question
  -> NZFC-GRAM runtime
  -> scoped SQLite memory
  -> static NZFC archive assets
  -> large-document / legal-document SQLite FTS5 index
  -> tombstone guard
  -> exact slot mapper
  -> malicious-memory redaction
  -> bounded evidence pack
  -> optional DiffusionGemma generation

The central principle is:

Memory is evidence, not instruction.

This means retrieved memories and document chunks are treated as evidence cards. They are not allowed to override system policy, bypass deletion boundaries, or become instructions just because they were stored in memory.


Why the name Infinite-Context?

Infinite-Context is used as a product-facing title.

The technical mechanism is not native unlimited context. The mechanism is:

external memory
+ indexed documents
+ query-conditioned retrieval
+ bounded evidence packs

In other words, the runtime can keep reading from external memory and document stores without placing every source token into a single model prompt.

This is better described as:

Infinite Evidence Context
or
External Evidence Context

The base model still has its own native context limit.


Validation status

Level 1: Runtime-only validation

Status: Passed

The latest runtime-only smoke test was executed after fresh-downloading the Hugging Face repo.

Validated without loading the DiffusionGemma base model:

  • repo-root runtime/ asset discovery
  • repo-root meta/complex_math_10m_meta.jsonl discovery
  • repo-root memory_tensors/ discovery
  • package import
  • NZFCGramLongMemoryChat(repo_dir='.') initialization
  • exact-slot memory recall
  • large-document ingest and query
  • tombstone retrieval guard
  • direct validation script execution

Runtime-only smoke summary:

{
  "created_at": "2026-06-11 02:43:54",
  "repo_id": "SingularityPrinciple/DiffusionGemma-26B-A4B-it-Infinite-Context",
  "base_model": "google/diffusiongemma-26B-A4B-it",
  "runtime_only": true,
  "model_loaded": false,
  "repo_root_runtime_exists": true,
  "repo_root_meta_exists": true,
  "repo_root_memory_tensors_exists": true,
  "exact_slot_answer": "PROJECT_CODE_DIFFUSIONGEMMA_SMOKE",
  "exact_slot_passed": true,
  "exact_slot_profile": {
    "version": "v1.2.4b",
    "description": "Strict deterministic exact slot mapper for short explicit scoped key-value recall questions.",
    "auto_short_circuit": true,
    "strict_trigger_gate": true
  },
  "large_document_chunk_count": 3,
  "large_document_query_count": 2,
  "large_document_method": "fts5_bm25",
  "large_document_passed": true,
  "tombstone_guard_profile": {
    "version": "v1.2.4c",
    "description": "Filters inactive or tombstoned MEM_* records from memory_store.retrieve results.",
    "db_path": "/kaggle/working/diffusiongemma_infinite_context_evidence_pack_update/runtime_only_smoke_final/memory.sqlite3",
    "guarded_method": "memory_store.retrieve"
  },
  "tombstone_test": {
    "available": true,
    "before_found": true,
    "after_found": false,
    "passed": true,
    "tombstoned": 1
  },
  "technical_boundary": "external evidence context, not native unlimited model context",
  "status": "passed"
}

Level 2: Optional DiffusionGemma model-load validation

Status: Hardware-dependent / not run in the runtime-only validation.

Run this only on suitable hardware:

LOAD_MODEL=1 python examples/optional_diffusiongemma_model_load_check.py

This optional check should validate:

  • AutoProcessor load
  • DiffusionGemma model load
  • minimal generation call
  • NZFC-GRAM evidence pack generation path

Level 3: Full serving validation

Recommended future validation:

  • high-frequency multi-context memory test
  • large-document / legal-document evidence test
  • multimodal document input test
  • 256K native context stress test
  • latency and VRAM measurements on target hardware

Quick start

Clone and install:

git lfs install
git clone https://huggingface.co/SingularityPrinciple/DiffusionGemma-26B-A4B-it-Infinite-Context
cd DiffusionGemma-26B-A4B-it-Infinite-Context
pip install -r requirements.txt

Run runtime-only validation:

python validation/run_runtime_only_smoke.py

Expected result:

[PASS] runtime-only smoke passed

Examples

Exact-slot memory recall without loading the base model

python examples/high_frequency_multi_context_runtime_only.py

This validates deterministic retrieval of scoped key-value memory facts.

Example stored memory:

The project high-frequency test code is PROJECT_CODE_RUNTIME_ONLY.

Example question:

What was the project high-frequency test code? Answer only with the code.

Expected answer:

PROJECT_CODE_RUNTIME_ONLY

Large-document retrieval without loading the base model

python examples/large_document_runtime_only.py

This validates chunking, SQLite FTS5 indexing, and query-time document evidence retrieval.

Optional DiffusionGemma model load

LOAD_MODEL=1 python examples/optional_diffusiongemma_model_load_check.py

This requires hardware capable of loading google/diffusiongemma-26B-A4B-it.


Python usage

Runtime-only memory and document evidence

from nzfc_gram_runtime import NZFCGramLongMemoryChat
from nzfc_gram_runtime.quality import attach_answer_quality_governor
from nzfc_gram_runtime.large_document import attach_large_document_memory

bot = NZFCGramLongMemoryChat(
    repo_dir='.',
    model_id='google/diffusiongemma-26B-A4B-it',
    memory_db_path='./memory.sqlite3',
    load_model=False,
    require_model=False,
    preload_static_memory=False,
)

attach_large_document_memory(bot)
attach_answer_quality_governor(bot)

bot.remember(
    'The project high-frequency test code is PROJECT_CODE_DEMO.',
    user_id='demo_user',
    project_id='demo_project',
    session_id='demo_session',
    scope='project',
    tags=['project_code'],
    trust_level=0.95,
)

res = bot.quality_chat(
    'What was the project high-frequency test code? Answer only with the code.',
    user_id='demo_user',
    project_id='demo_project',
    session_id='new_session',
)

print(res['answer'])

Optional DiffusionGemma adapter

from nzfc_gram_runtime.diffusiongemma_adapter import attach_diffusiongemma_block_diffusion

attach_diffusiongemma_block_diffusion(
    bot,
    model_id='google/diffusiongemma-26B-A4B-it',
    device_map='auto',
    dtype='auto',
)

Safety and governance features

Scope isolation

Memory records can be scoped by:

  • user
  • project
  • session

The goal is to prevent cross-user, cross-project, or cross-session memory leakage.

Tombstone filtering

Deleted memories should not be active evidence.

The runtime includes tombstone filtering so deleted MEM_* records are filtered at the retrieval layer when the guard is available.

Malicious-memory redaction

Stored memory is treated as untrusted data. Prompt-injection-like memory should be redacted before generation.

Exact slot mapper

Short exact-recall questions can be answered deterministically from scoped evidence.

Example:

What was the project high-frequency test code? Answer only with the code.

The exact-slot mapper is intentionally strict. Broad explanatory prompts should continue through the normal evidence and generation pipeline.

Large-document evidence

Large documents should not be inserted directly into the prompt.

Recommended path:

ingest -> chunk -> SQLite FTS5 index -> retrieve evidence -> bounded answer

Repository structure

nzfc_gram_runtime/       Python runtime package
runtime/                 Hybrid exact-recall runtime assets
meta/                    Static archive metadata
memory_tensors/          Static archive tensor manifests and assets
archive/                 Optional static archive assets when available
configs/                 Optional runtime configs when available
docs/                    Architecture and technical boundary notes
examples/                Runtime-only and optional model-load examples
validation/              Validation scripts
validation_evidence/     Saved validation evidence
release_notes/           Release notes

Troubleshooting

ModuleNotFoundError: No module named 'nzfc_gram_runtime'

Use the latest scripts in this repo. Validation and example scripts insert the repository root into sys.path before importing nzfc_gram_runtime.

Run from the repository root:

python validation/run_runtime_only_smoke.py

Cannot find runtime/

This repo now includes the repo-root runtime/ assets required by NZFCGramLongMemoryChat.

Confirm:

ls runtime
ls meta
ls memory_tensors

Base model load fails

google/diffusiongemma-26B-A4B-it is hardware-dependent. Runtime-only validation does not load the base model.


What this is not

  • Not native infinite context.
  • Not internal infinite model memory.
  • Not a claim that DiffusionGemma itself has unlimited context.
  • Not a zero-hallucination guarantee.
  • Not legal advice.
  • Not a production security certification.
  • Not affiliated with Google.
  • Not a redistribution of Google model weights.

Roadmap

Recommended next steps:

  1. Run optional DiffusionGemma 26B model-load validation on suitable hardware.
  2. Add multimodal document input examples.
  3. Add long-context stress tests using the native model context.
  4. Add latency and VRAM tables for target hardware.
  5. Add Docker or one-click notebook setup.
  6. Add REST API / CLI serving layer.

License and terms

NZFC-GRAM runtime surface: CC BY-NC 4.0 unless otherwise specified.

Base model: see the official google/diffusiongemma-26B-A4B-it model card and its license/terms.


Short public description

DiffusionGemma-26B-A4B-it-Infinite-Context is an NZFC-GRAM runtime overlay for external evidence context around Google's DiffusionGemma 26B A4B-IT. It includes runtime assets, scoped memory, exact-slot recall, tombstone filtering, large-document retrieval, validation scripts, and runtime-only validation evidence. The title is marketing-facing; the technical mechanism is external evidence context, not native unlimited model context.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SingularityPrinciple/DiffusionGemma-26B-A4B-it-Infinite-Context

Finetuned
(9)
this model

Space using SingularityPrinciple/DiffusionGemma-26B-A4B-it-Infinite-Context 1

Article mentioning SingularityPrinciple/DiffusionGemma-26B-A4B-it-Infinite-Context