MemoryDecoder
Collection
2 items
โข
Updated
Memory Decoder is a pretrained, plug-and-play memory component designed for efficient domain adaptation of large language models. This checkpoint contains the GPT2-small Memory Decoder trained on WikiText-103, as described in our NeurIPS 2025 paper.
Memory Decoder bridges the gap between non-parametric retrieval methods and parametric fine-tuning approaches. By pre-training a compact transformer decoder to internalize retrieval patterns, it provides:
from memDec import MemoryDecoder
import transformers
from transformers import AutoModelForCausalLM
from loguru import logger
# Define paths to your models
base_lm_path = "gpt2-xl" # or any GPT2 variant
knn_generator_path = "Clover-Hill/MemoryDecoder-gpt2-small"
# Load tokenizer and models
tokenizer = transformers.AutoTokenizer.from_pretrained(base_lm_path)
base_lm = AutoModelForCausalLM.from_pretrained(base_lm_path)
knn_generator = AutoModelForCausalLM.from_pretrained(knn_generator_path)
# Resize embeddings and set to evaluation mode
base_lm.eval()
knn_generator.eval()
# Create the joint Memory Decoder model
joint = MemoryDecoder(base_lm, knn_generator, lmbda=0.55, knn_temp=1.0).to("cuda")
# Prepare input prompt
prompt = "As with previous Valkyira Chronicles games , Valkyria Chronicles III is"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Generate with Memory Decoder
out_ids = joint.generate(**inputs, max_new_tokens=20, do_sample=False)
logger.info(f"Memory Decoder output: {tokenizer.decode(out_ids[0], skip_special_tokens=True)}")
# Generate with base model for comparison
out_ids = base_lm.generate(**inputs, max_new_tokens=20, do_sample=False)
logger.info(f"Base Model output: {tokenizer.decode(out_ids[0], skip_special_tokens=True)}")
๐ Generation Results Comparison:
Model | Generated Continuation |
---|---|
Base Model | "...is a turn-based strategy game. The player takes control of a squad of Valkyria soldiers..." |
+Memory Decoder | "...is a role-playing video game developed by Sega and published by Sega for the PlayStation 2." |
Memory Decoder correctly identifies Valkyria Chronicles III as a role-playing game (factually accurate), while the base model incorrectly predicts it as a strategy game.
Model Configuration | Perplexity | Improvement |
---|---|---|
GPT2-small (baseline) | 24.89 | - |
GPT2-small + MemoryDecoder | 13.36 | -11.53 |
GPT2-medium (baseline) | 18.29 | - |
GPT2-medium + MemoryDecoder | 12.25 | -6.04 |
GPT2-large (baseline) | 15.80 | - |
GPT2-large + MemoryDecoder | 11.53 | -4.27 |
GPT2-xl (baseline) | 14.39 | - |
GPT2-xl + MemoryDecoder | 10.93 | -3.46 |
@article{cao2025memory,
title={Memory decoder: A pretrained, plug-and-play memory for large language models},
author={Cao, Jiaqi and Wang, Jiarui and Wei, Rubin and Guo, Qipeng and Chen, Kai and Zhou, Bowen and Lin, Zhouhan},
journal={arXiv preprint arXiv:2508.09874},
year={2025}
}
For questions and support: maximus.cao@outlook.com
Base model
openai-community/gpt2