GLYPH-SFT-V2

Full fine-tuned GLYPH trace model based on Qwen/Qwen3-4B-Base.

Summary

This model was trained to produce rigid GLYPH-style traces with:

  • plan
  • act
  • optional tool turns
  • final response
  • explicit refs and todo satisfaction

It is the SFT checkpoint intended to serve as the starting point for RLVR.

Base Model

  • Qwen/Qwen3-4B-Base

Dataset

  • JayZenith/GLYPH_SFT_DATASET

Training

Key settings:

  • full fine-tune
  • 1 epoch
  • lr=1e-5
  • lm_head_lr=1.5e-5
  • assistant_only masking
  • max_seq_length=1024

For reproduction details, eval setup, and artifacts, see:

  • JayZenith/glyph

Results

Held-out results from the reproduced run:

  • weighted loss: 2.2446 -> 0.3300
  • perplexity: 9.44 -> 1.39
  • held-out formal eval: 97/100

Example Prompt Prefix

<|im_start|>system
system「You are a Rust language assistant who gives compact conceptual explanations.」
<|im_end|>

<|im_start|>user
user「In Rust, what does a `'static` lifetime usually mean in practice? Keep it concise.」🏷 usr1
<|im_end|>

<|im_start|>assistant

Example Output

plan {
    todo ↦ {
        1 ↦ "Explain the static lifetime concept briefly." ※ usr1
    } •
    rationale ↦ "Tie it to data that lives for the entire program."
}

act {
    think ↦ [
        「Keep the answer focused on global or program-wide data.」 𝑝 0.9 🏷 note_static ※ [ usr1 ]
    ]
}

response「A `'static` lifetime usually means the data is valid for the entire program, like static variables, string literals, or data shared across threads.」
※ [ note_static ]
⊨ 1
<|im_end|>
Downloads last month
220
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JayZenith/GLYPH-SFT-V2

Finetuned
(279)
this model