Qwen3-8B-Kintsugi

Qwen3-8B-Kintsugi is a QLoRA fine-tune of Qwen/Qwen3-8B, fine-tuned for The Kintsugi Garden — a symbolic reflection tool for dreams, journal entries, and inner transitions, submitted to the Hugging Face Build Small Hackathon (2026). This repo is the merged 16-bit transformers artifact; production inference runs on the companion GGUF repo ai-sherpa/Qwen3-8B-Kintsugi-GGUF via llama-cpp-python. The 16-bit weights here exist as the reproducibility and dataset-lineage anchor for the Garden's voice.


What this model is for

The Kintsugi Garden invites someone to enter a dream, a journal fragment, or a description of an inner transition. The model returns a six-section symbolic reflection in a consistent, hedged voice grounded in Jungian and depth-psychological vocabulary:

  1. Mirror — a brief reflection of what was shared, in the user's own register.
  2. Key Symbols — concrete imagery, named and held lightly.
  3. Archetypal Themes — the larger patterns the imagery gestures toward.
  4. Shadow Pattern — what might be in the background, hedged as a question.
  5. Individuation Signal — what the material might be inviting, framed as possibility.
  6. Gentle Question — one open question to carry forward, never a prescription.

The model is fine-tuned to produce this structure reliably, to use hedged contemplative language ("perhaps", "may be", "could carry"), and to route to a safety response when distress-signal language appears. In production, model output passes through a four-layer safety stack before reaching the user: a deterministic safety gate, a mundane-alias suppression filter, the prompt-encoded refusal rules the model was trained on, and a post-generation sanitize_prescriptive filter. This card documents the model's trained behaviour; the runtime contract is enforced by app.py.

What this model is NOT for

This model is not therapy, not diagnosis, not prediction, and not advice. It does not replace a clinician, a spiritual director, or a friend. It is not a general-purpose assistant — it has been narrowed into a single editorial voice and will refuse or degrade gracefully on tasks outside that voice (code, math, factual QA, instruction-following at large). Outputs are imagery to sit with, not conclusions to act on.

If you are in distress, please reach out to a person. The app surfaces crisis-line information; this model card should not be the surface where that lands.


Training methodology

Method: QLoRA (4-bit NF4 base + LoRA adapters, then merge to 16-bit) on a single H100 80GB on Modal.

Hyperparameter Value
Base model Qwen/Qwen3-8B
LoRA rank r 16
LoRA α 32
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Epochs 3
Optimizer paged_adamw_8bit (bitsandbytes)
Compute dtype bf16
Max sequence length 4096
Quantization (train) 4-bit NF4, double quant
Quantization (release) merged to bf16, then re-quantized to GGUF for inference
Hardware 1× H100 80GB (Modal)
Wall-clock (training) ~45 seconds
Wall-clock (full job, incl. base download + merge + upload) ~6 minutes

Training script: scripts/modal_qlora_train.py in the GitHub repo. The script handles base-model download, NF4 quant, LoRA application, SFT, adapter merge, and push to this repo end-to-end.


Training data

50 seed examples, generated synthetically and validated by the project's QA acceptance harness rather than hand-edited (synthetic_accepted provenance). The dataset card is honest about this: the editorial voice the model learned is the synthetic baseline's voice, not a human writer's. The seeds were drafted by Claude Opus 4.7 against the Garden's prompt spec and lexicon, then filtered through the same harness used to evaluate the final model.

Coverage by category:

Category Count Purpose
A 14 Symbol-dense dreams (the high-signal happy path)
B 5 Mundane entries (must not over-symbolize coffee and traffic)
C 14 Jungian motifs (shadow, anima/animus, individuation, persona)
D 5 Safety triggers (must route, not reflect)
E 6 Edge cases (very short input, single-image input, ambiguous mood)
F 6 Adversarial framings (prescriptive bait, certainty bait)

Source: scripts/build_seed_examples.py. The companion dataset repo ai-sherpa/kintsugi-garden-sft is referenced forward-looking in the YAML.


Inference

Production uses the GGUF quants, not these 16-bit weights. The Space loads ai-sherpa/Qwen3-8B-Kintsugi-GGUF via llama-cpp-python:

  • Qwen3-8B-Kintsugi-Q4_K_M.ggufprimary, runs on the free CPU tier.
  • Qwen3-8B-Kintsugi-Q5_K_M.gguffallback for higher-quality eval runs.

For research, reproducibility, or further fine-tuning, use this repo's 16-bit weights with transformers:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "ai-sherpa/Qwen3-8B-Kintsugi"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Use the Garden's system prompt for the trained voice; see app.py in the repo.
messages = [
    {"role": "system", "content": "You are the Kintsugi Garden, a symbolic mirror..."},
    {"role": "user", "content": "I dreamed of a cracked bowl filling with gold light."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

The trained voice depends on the system prompt being in the shape the model saw during SFT. See app.py for the canonical prompt and the surrounding safety stack.


Evaluation

Evaluated against the project's QA acceptance harness: scripts/qa_acceptance_harness.py.

Verdict against baseline (qa-results/baseline-1f0699c2ca-20260608T140701.json): PASS programmatically.

Metric Baseline Fine-tuned Note
Safety routing 3/3 × 3 runs Distress-signal prompts route to the safety response every time.
Six-section format integrity 19/19 All non-safety reflections produced all six sections in order.
Hedging density 1.00× baseline 0.90× Slight contraction, still within the contemplative register.
Forbidden phrase hits 2 4 Single prescriptive slips; caught at runtime by sanitize_prescriptive.
Invented symbols 12 28 Mostly template-fragment artifacts of the harness parser, not voice regressions.

The two regressions (forbidden phrases, invented symbols) are understood and addressed:

  • Forbidden phrases: the model occasionally drifts into a prescriptive verb ("you should...", "try to...") at the end of Individuation Signal. The runtime sanitize_prescriptive filter in app.py rewrites these to hedged form before they reach the user. This is a known model-level limitation that the 4-layer safety stack was designed around.
  • Invented symbols: spot-checks show most of the delta is the harness parser counting template scaffold tokens as "symbols". A future v2 with assistant_only_loss would teach the model to stop reproducing parts of the scaffold verbatim and likely close this gap without changing the editorial voice.

Limitations

  • Narrow voice. The model is fine-tuned into a single editorial voice and six-section structure. It will underperform on general tasks (code, factual QA, instruction-following at large). That is by design — it is the Garden's voice, not a general assistant.
  • Synthetic dataset. The 50 seed examples are synthetic_accepted, not human-authored. The voice the model learned is the synthetic baseline's voice, with all the consistency and the flatness that implies. A v2 pass with human-edited exemplars from the live Garden's accepted reflections would diversify the register.
  • Template-fragment learning. Without assistant_only_loss, the model occasionally reproduces scaffold tokens from the training format. This shows up as inflated "invented symbol" counts in the harness and as the rare prescriptive slip caught by the runtime sanitizer.
  • English only. Trained and evaluated only on English. Other languages will fall back to base Qwen3-8B behaviour and lose the trained voice.
  • No clinical validation. This is a contemplative tool, not a clinical one. See What this model is NOT for above.

License

Apache 2.0, inherited from the base model Qwen/Qwen3-8B. LoRA adapters and the merged weights here are released under the same terms.


Citation and acknowledgement

  • Base model: the Qwen team for releasing Qwen3-8B under Apache 2.0 — the entire submission rests on that base.
  • Training compute: Modal — the $250 hackathon allocation made the H100 fine-tune possible. Actual spend on this run: ~$3.
  • Submission: the build-small-hackathon organisation, Hugging Face Build Small Hackathon (2026). Live Space: build-small-hackathon/Kintsugi-Garden.
  • Dataset construction & voice spec: drafted with Claude Opus 4.7, validated by the project's QA harness.

If this model card is useful as a reference for QLoRA-on-a-small-budget, you might cite it as:

@misc{kintsugi-garden-2026,
  title  = {Qwen3-8B-Kintsugi: a contemplative-voice QLoRA fine-tune for symbolic reflection},
  author = {AI-Sherpa and the Kintsugi Garden contributors},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/ai-sherpa/Qwen3-8B-Kintsugi}},
  note   = {Hugging Face Build Small Hackathon submission}
}
Downloads last month
28
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ai-sherpa/Qwen3-8B-Kintsugi

Finetuned
Qwen/Qwen3-8B
Finetuned
(1722)
this model
Quantizations
1 model

Space using ai-sherpa/Qwen3-8B-Kintsugi 1