GDN1 32K Anchor 1B Full Fine-Tune

This is a research checkpoint from the Long-GDN workspace.

Base Model

  • Base checkpoint: linear-moe-hub/Gated-Deltanet-1.3B
  • Architecture: Gated DeltaNet / linear recurrent attention
  • Base training data reported by the upstream model card: SlimPajama 100B-token sample
  • License inherited from upstream model card: Apache-2.0

Training Run

  • Local source path: runs/gdn1_32k_anchor_from_balanced200_1b_bs10_ft/final
  • Tokenizer source: runs/gdn1_32k_anchor_from_balanced200_1b_bs10_ft/final
  • Training mode: full fine-tuning, no LoRA/adapter
  • Hardware target: 8x NVIDIA H200
  • Sequence length: 32768
  • Approximate additional token budget: ~1B additional tokens
  • Manifest/config: configs/gdn1_memory_mix_32k_anchor_recovery.json

Intended Research Use

This checkpoint is intended for research on:

  • long-context associative recall
  • RULER/MQAR-style state tracking
  • recurrent-state contamination during long generation
  • Reference-State Reset with Rolling Replay, a GDN/RNN adaptation of the R-SWA idea

Usage

These checkpoints use the FLA Gated DeltaNet implementation. In the current Long-GDN environment, plain GatedDeltaNetForCausalLM.from_pretrained() can hit a Transformers 5.x tied-weight metadata issue. The robust path is to patch the FLA tied-weight metadata before loading.

Install/runtime requirements:

pip install torch transformers safetensors huggingface_hub
# plus an FLA package/source tree that provides:
#   fla.models.gated_deltanet.GatedDeltaNetForCausalLM

CPU Example

import torch
from transformers import AutoTokenizer
from fla.models.gated_deltanet import GatedDeltaNetForCausalLM

repo_id = "LLM-OS-Models/gdn1-32k-anchor-1b"

# Transformers 5.x compatibility patch for the installed FLA class.
if isinstance(getattr(GatedDeltaNetForCausalLM, "_tied_weights_keys", None), list):
    GatedDeltaNetForCausalLM._tied_weights_keys = {
        "lm_head.weight": "model.embeddings.weight"
    }

tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = GatedDeltaNetForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.float32,
)
model.eval()

prompt = "A special magic number is 12345. What is the special magic number?"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=32,
        do_sample=False,
    )
print(tokenizer.decode(output[0], skip_special_tokens=True))

Single-GPU bf16 Example

import torch
from transformers import AutoTokenizer
from fla.models.gated_deltanet import GatedDeltaNetForCausalLM

repo_id = "LLM-OS-Models/gdn1-32k-anchor-1b"

if isinstance(getattr(GatedDeltaNetForCausalLM, "_tied_weights_keys", None), list):
    GatedDeltaNetForCausalLM._tied_weights_keys = {
        "lm_head.weight": "model.embeddings.weight"
    }

tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = GatedDeltaNetForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.bfloat16,
).to("cuda")
model.eval()

prompt = "Reference facts:\n- key_alpha: value_123\n\nQuestion: key_alpha?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=32,
        do_sample=False,
    )
print(tokenizer.decode(output[0], skip_special_tokens=True))

Long-GDN Local Loader

The project repository includes a more defensive loader at scripts/gdn1_common.py::load_gdn1_causal_lm. It handles the compatibility patch and older public-checkpoint key conversion used in local experiments.

from pathlib import Path
import torch
from transformers import AutoTokenizer
from scripts.gdn1_common import load_gdn1_causal_lm

repo_or_local_path = Path("path/to/downloaded/checkpoint")
tokenizer = AutoTokenizer.from_pretrained(repo_or_local_path, use_fast=True)
model = load_gdn1_causal_lm(repo_or_local_path, torch_dtype=torch.bfloat16).to("cuda")

Known Results

Anchor-heavy continuation from balanced checkpoint-200. Checkpoint sweep did not repair 32K and damaged 16K; not selected as current best.

Caveats

Not the current best checkpoint. Uploaded for ablation/audit only.

Citation Context

Relevant background papers include Gated Delta Networks, Gated DeltaNet-2, Log-Linear Attention, and Unlimited OCR / R-SWA. This checkpoint does not implement a new architecture by itself; it is part of a checkpoint-preserving full fine-tuning and inference-control study.

Downloads last month
36
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLM-OS-Models/gdn1-32k-anchor-1b

Finetuned
(7)
this model

Dataset used to train LLM-OS-Models/gdn1-32k-anchor-1b