rnj-1.5-instruct (sliding-window patched)

Drop-in transformers repack of EssentialAI/rnj-1.5-instruct intended for quick inference tests, downstream software compatiblity (transformers/torch itself, llama.cpp, etc):

  • Upstream declares layer_types: chunked_attention, which modeling_gemma3.py doesn't implement - inference crashes with KeyError: 'chunked_attention'. This repo swaps those entries to sliding_attention so the model loads under stock transformers. Weights unchanged, resaved in bf16.

Sliding window (8192) is not identical to the original block-local attention - equivalent for prompts ~< 8192 tokens, divergent beyond that. For faithful long-context inference, use vLLM 0.20.0 against the upstream repo.

Changes vs upstream

Field Upstream Here
Local layer type chunked_attention sliding_attention
RoPE params for locals under chunked_attention key moved to sliding_attention key
Dtype float32 bfloat16
Architecture string Rnj1ForCausalLM Gemma3ForCausalLM

Local/global layer pattern (LLLGLLLGLLLGLGGGGGLGLLLGLLLGLLLL) preserved.

Usage

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="pszemraj/rnj-1.5-instruct",
    dtype=torch.bfloat16,
    device_map="auto",
)
res = pipe([{"role": "user", "content": "Who are you?"}])
print(res)

License

Apache 2.0, inherited from upstream. See the original model card for architecture, benchmarks, and citation.

Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pszemraj/rnj-1.5-instruct

Finetuned
(6)
this model
Quantizations
1 model