⚠️ WORK IN PROGRESS — ongoing research. These checkpoints are part of an active, unfinished research program. Results, methods, and conclusions are preliminary and subject to change. This repository is being made public early for transparency and reproducibility, and is a prelude to a likely future publication; please treat everything here as in-progress research artifacts, not final results.

NLA / Activation-Verbalizer research checkpoints — Gemma-4-E2B

Research-stage LoRA adapters from an experimental program training a Natural Language Autoencoder (NLA), also called an Activation Verbalizer (AV): a model that takes a residual-stream activation vector from a base LLM and produces a natural-language description of its semantic content.

These are research checkpoints, not a polished released model. They are published for reproducibility of the accompanying experiments and write-up.

What's here

Rank-8 (and one rank-80) LoRA adapters for google/gemma-4-E2B, loaded in 4-bit NF4.
Multiple training runs exploring the activation-injection mechanism and training objective:
- injection at the input-embedding layer vs at a deep residual layer (native scale);
- a domain-aware contrastive (InfoNCE) objective with same-domain and cross-domain hard negatives;
- prior-deviation token weighting; capacity (negative-count) sweeps.
Per-run trajectory checkpoints (saved every 100 steps) plus injection configs (inject_config.json, inject_mean.npy) needed to reproduce the exact injection used at train/eval time.

How to load

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                         bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True)
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-E2B", quantization_config=bnb,
                                            device_map={"": 0})
av = PeftModel.from_pretrained(base, "Solshine/nla-gemma4e2b-research-checkpoints/<run>/<step>")

Each run directory carries its own inject_config.json describing where (embedding vs residual layer) and how (raw vs mean-centered) the activation vector is injected. Match it when evaluating.

Status & caveats

Honest summary: these adapters reliably learn domain-level conditioning (which domain an activation came from) but do not yet achieve fine within-domain content conditioning in generation — an open research problem. Treat outputs as experimental. Metrics, methods, and per-run details accompany the project write-up.

Adapters trained on a single 4 GB consumer GPU. Apache-2.0.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Solshine/nla-gemma4e2b-research-checkpoints

Base model

google/gemma-4-E2B

Adapter

(23)

this model