NaviGen GRPO Adapter - step600

This repository contains the GRPO-trained LoRA adapter used by NaviGen, a personalized generative recommendation model for producing user-aware image and video generation instructions.

NaviGen represents each item with a dual identifier that couples a collaborative code and a textual code in one token stream. This adapter is the reinforcement learning stage of the NaviGen pipeline: it further aligns the stage-2 supervised model with user intent through reward-guided optimization.

Model Details

  • Model name: NaviGen GRPO Adapter, step600
  • Model type: PEFT LoRA adapter for causal language modeling
  • Base model: NaviGen-stage2-base
  • Backbone family: Qwen3-style causal LM
  • Training stage: GRPO reinforcement learning after two-stage SFT
  • Adapter format: adapter_model.safetensors
  • PEFT version: 0.19.1

The adapter targets the main attention and MLP projection layers:

q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Intended Use

This adapter is intended for research on personalized generative recommendation, especially settings where a model should infer user preference from historical item identifiers and produce more specific, relevant, and visually generatable generation instructions.

Typical uses include:

  • Personalized prompt or instruction generation for image/video models
  • Next-item or identifier prediction under the NaviGen token format
  • Reproduction and analysis of the NaviGen RL stage
  • Ablation studies comparing SFT and GRPO-aligned checkpoints

This adapter is not a standalone model. It must be loaded on top of the corresponding NaviGen stage-2 base model.

Quick Start

Install the main dependencies:

pip install torch transformers peft safetensors

Load the adapter with PEFT:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "NaviGen-stage2-base"
adapter_id = "NaviGen-grpo-step600"

tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Replace base_model_id and adapter_id with the final repository names used in your release.

Input Format

The adapter follows the NaviGen training format. Inputs should use the same tokenizer and special tokens released with this checkpoint. In general, prompts contain user history, item identifiers, and task instructions serialized in the NaviGen token stream.

For reproducibility, use the tokenizer files included in this repository:

  • tokenizer.json
  • tokenizer_config.json
  • special_tokens_map.json
  • added_tokens.json
  • chat_template.jinja
  • vocab.json
  • merges.txt

Training Summary

NaviGen uses a two-stage SFT + RL pipeline:

  1. Stage-1 SFT: learns item identifier and preference-aware representations.
  2. Stage-2 SFT: distills preference reasoning and instruction writing from searched supervision.
  3. GRPO alignment: optimizes the model with hierarchical and self-consistent rewards to better match user intent and generation quality.

This checkpoint corresponds to the GRPO adapter saved at training step 600.

Limitations

  • The adapter depends on the matching NaviGen base model and tokenizer.
  • Outputs are sensitive to the exact prompt format and identifier vocabulary.
  • The model is designed for research use and has not been audited for all production safety requirements.
  • Generated instructions may still contain irrelevant, underspecified, or visually difficult content.

Files

Core files for inference:

  • adapter_config.json
  • adapter_model.safetensors
  • tokenizer and chat template files

Training-resume states such as optimizer or scheduler checkpoints are not required for normal inference.

Citation

If you use this model, please cite the NaviGen paper once the citation is released.

@article{navigen,
  title   = {NaviGen: Personalized Generative Recommendation with Dual Identifiers},
  author  = {NaviGen Authors},
  journal = {TBA},
  year    = {2026}
}
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support