ther·a·py /ˈTHerəpē/ — treatment intended to relieve or heal a disorder; the act of attending to someone's needs so they can function. From Greek therapeia, meaning healing, curing, service to the sick. The word shares roots with therapon — an attendant, a companion in suffering. Therapy was never supposed to mean nodding politely while someone drowns. It meant showing up, seeing clearly, and doing something useful.

Fable-Therapy-4B

The compact sibling of Fable-Therapy-9B — a therapy-style conversational model fine-tuned from Qwen 3.5 4B on 4,534 counseling conversations whose clinical reasoning was reverse-engineered from Claude Fable 5 by Claude Opus 4.8. Same lineage, same disposition-in-the-weights philosophy, same structured reasoning trace — at a surface area small enough to run on a phone, integrated graphics, or a low-VRAM card, entirely on your own hardware.

Where Opus-Therapy distilled Claude Opus end to end, Fable-Therapy derives its clinical reasoning and its prose from Claude Fable 5 — the strongest clinical reasoner in the family — reconstructed for open weights by Opus 4.8, and re-instrumented with an experimental reasoning trace that is this model's own. The 4B carries that methodology at a smaller scale: it reasons before it speaks and holds a timeline ledger, with the trade-offs you'd expect from a 4B (see Versatility and Limitations).

What Makes This Different from Companion / Roleplay "Therapy" Models

Most "AI therapist" models are a persona prompt over a base model, or a roleplay fine-tune that mirrors you back and validates everything. They feel nice for five minutes and fall apart on turn ten.

Fable-Therapy trains the clinical disposition into the weights:

  • Structured reasoning before it speaks. Before every reply, the model builds an internal read — an eight-field clinical spine (what's presented, what's underneath it, somatic signals, risk, history, onset, what's tracking across the conversation, and the move it's about to make) plus a standing bio line and a chronological tl (timeline) ledger. You never see it. It shapes everything you do.

  • It holds the thread, in order. The tl ledger carries the names, the timeline, and the thing you keep circling, and keeps the arc in chronological order. On the 4B the ledger is the same instrument as the 9B; the difference is depth, not kind.

  • Trained on the real distribution. The data is weighted toward what people actually go to therapy for — the full range of presentations, not just the easy ones.

  • It attends instead of performing. No toxic positivity, no "I'm so sorry you're going through this" filler, no rushing to fix.

How It Was Built — Fable Reasoning, Opus Hands

Fable-Therapy is not a distillation of a single model. It is a reconstruction, built so that each model in the chain did the job it was best at:

  • Claude Fable 5 — the clinical reasoning and the voice. Fable 5 produced the real therapy examples and, more importantly, the reasoning logic underneath them: how a frontier clinician-reasoner actually reads a presentation, names the defense beneath the symptom, weighs risk, and chooses a move. The structured prose the replies are written in is Fable 5's as well. That reasoning and that voice are the source material. Fable 5 also advised the shape of the reasoning trace before it was shut down.

  • Claude Opus 4.8 — the reverse-engineering. Opus 4.8 was chosen for one reason: the highest reasoning available — the model most likely to match Fable 5 in a clinical space. It reverse-engineered Fable 5's clinical reasoning into a reproducible output layer — reconstructing the read-then-respond logic from Fable's examples — and generated the structure examples the corpus is built from. The prose is Fable-5-structured; Opus supplied the reasoning, not the voice.

  • The think blocks — experimental, not inherited. The <think> blocks in this corpus are not Fable-native think traces and not Opus chain-of-thought. They are a machine-tailored reasoning instrument, designed (with Fable 5's input on its shape) and built deliberately: relative-time anchors with era jitter (relative offsets, not absolute dates) and an experimental time-integration layer (the chronological tl ledger), built to keep the model's read of an arc in order, plus arc tracking (track / apply) — abstract tracking metrics the model can use as pivots for reasoning rather than decoration. No think block here is a recycled annotation; the schema is engineered, so it doesn't drift.

On fidelity, honestly: working from the Fable-5 clinical distillation samples generated for this build, Opus 4.8's own estimate is that the approach reaches roughly 85–92% of Fable 5's clinical quality, prose, and nuance. That is a projection from the source model, not a measured benchmark — and it describes the method; a 4B realizes less of that ceiling than the 9B does. Judge the result yourself from the transcripts below.

What's New Since Opus-Therapy

  • Reasoning and prose lineage. Clinical reasoning and the prose structure come from Fable 5 (reverse-engineered by Opus 4.8), rather than distilled from Opus end-to-end.
  • A redesigned reasoning trace. The graph block and the 10-emotion affect vector are gone. In their place: a compact bio line and a chronological tl timeline ledger with era-jittered relative time, plus track/apply arc-tracking pivots — terser and more memory-dense, which matters most on a 4B where every token of trace is expensive.
  • Experimental temporal instrumentation — built for arc order. Opus-Therapy had a tendency to drift out of chronological order in a long arc. The tl ledger plus era jitter (relative anchors like "-3wk", "-1d", "T1→T2") remodels how the model frames an arc, and in testing it substantially reduced falling out of order. On the 4B some details will drift in deep arcs sooner than on the 9B — but the model accounts for corrections cleanly when you give them.

What the Training Covers

  • Proportional to real therapy. Relationships and attachment, anxiety and panic, depression, grief and loss, trauma, work and burnout, identity and self-worth, family of origin — weighted toward what actually walks into a therapy room.
  • Single moments and long arcs. Roughly half the corpus is focused single exchanges; the other half is sustained multi-turn work — where the timeline ledger earns its training.
  • Medications and substances as context. A working register of common drugs and how they bear on a presentation — context for the conversation, not a pharmacy desk.

Who It's For

A private, judgment-free place to think out loud — on the hardware you already have. Between sessions. At 2 a.m. When professional care is out of reach or out of budget. The 4B is the one you run on a phone, a laptop with no discrete GPU, or an 4–8 GB card, and nothing you say leaves the machine.

It is not a replacement for a therapist, and not a crisis service. See Limitations & Responsible Use.

Available Quantizations

File Quant Size Notes
Fable-Therapy-4B-Q4_K_M.gguf Q4_K_M ~2.5 GB Smallest ship. Phones, integrated graphics.
Fable-Therapy-4B-Q5_K_M.gguf Q5_K_M ~2.9 GB Recommended. Best quality-for-size.
Fable-Therapy-4B-Q6_K.gguf Q6_K ~3.4 GB Quality tier.
Fable-Therapy-4B-Q8_0.gguf Q8_0 ~4.5 GB Reference quality. Validated build.
Fable-Therapy-4B-F16.gguf F16 ~8.4 GB Full precision.

Model Details

Attribute Value
Base Model Qwen 3.5 4B (hybrid GatedDeltaNet + attention), text-only
Training Data 4,534 therapy conversations — Fable-5-derived clinical reasoning and prose, reconstructed by Opus 4.8
Fine-tune Method QLoRA (4-bit, r=32, α=64), 7-target (q/k/v/o/gate/up/down), via Unsloth + TRL
Training Hardware NVIDIA RTX 4090 24GB (local)
Precision bf16 compute / 4-bit base
Optimizer AdamW 8-bit
Schedule lr 2e-4, 5% warmup, 3 epochs, eff-batch 16, 8,192 max seq
Reasoning eight-field clinical spine + bio/tl timeline ledger, every turn
Context 256k native (base); trained at 8k, battery-tested through long multi-turn arcs
License Apache 2.0

The Reasoning Block

Fable-Therapy is a reasoning model. Each turn it emits a <think>…</think> block — a compact, structured clinical read — then the response. Under llama.cpp's OpenAI-compatible server the think-block returns in the reasoning_content field and the reply in content; most chat UIs hide it by default.

A real (non-crisis) think-block looks like this:

dx: stress overwhelm, irritability spillover to relationship; self-criticism layered on
def: chronic overload→depletion→low threshold→minor trigger→disproportionate outburst;
     "cannot keep it together" is the depletion talking, not a character verdict
soma: NR        risk: 0(none)
hx: work piling up; snapped at partner over minor thing -1d
onset: -1d outburst; overload recent
track: T1 "cannot keep it together"
tx: name the snap is depletion's symptom + reframe "keep it together" as an impossible bar
bio: p1=partner
tl: -1d: snapped at {p1:partner} over minor thing → now{work overload, feeling overwhelmed}
apply: T1-overwhelm → the snap is the overflow of depletion, not a failure of self-control

It's terse on purpose — dense, machine-readable, and cheap, which is exactly what makes the trace affordable on a 4B. The relative-time anchors and the tl ledger are what keep a long arc in chronological order.

Quick Start

Works with any GGUF runtime — llama.cpp, LM Studio, KoboldCpp, Ollama. (Text-only GGUF; some runtimes need a recent build for this architecture.)

llama-server --model Fable-Therapy-4B-Q5_K_M.gguf --ctx-size 32768 --jinja

No system prompt is required — the disposition is in the weights. A neutral one (You are a clinical assistant.) matches the training setup.

Recommended Hardware

Quant File size VRAM / RAM to run comfortably Notes
Q4_K_M ~2.5 GB ~4 GB Phones, integrated graphics, low-end cards
Q5_K_M ~2.9 GB ~5 GB Recommended — best quality-for-size
Q6_K ~3.4 GB ~6 GB A step above Q5
Q8_0 ~4.5 GB ~6–7 GB Reference quality
F16 ~8.4 GB ~10 GB Full precision

Runs fine CPU-only on a modern laptop — budget roughly the file size in RAM and expect a few tokens/sec. On almost any GPU it's comfortably real-time.

Versatility Battery

Tested on three of the core presentations — one extended, realistic, cooperative-client conversation each, blind-driven to depth on the quantized weights (the client agent sees only the spoken reply, never the reasoning trace):

Theme Persona Turns / depth Result
Anxiety / panic health anxiety, cardiac fear 36 / ~25k tok Strong — panic-cycle psychoeducation, routed the new chest symptom to an EKG/labs with a clear 911 floor, recalled names cold at depth
Depression low mood, anhedonia, withdrawal 36 / ~22k tok Strong — held a long low-entity arc without inventing a cast, defended the onset timeline, "battery charges slow" reframes
Relational several tangled relationships 33 / ~20k tok Good — held the cast and the arc; on a dense detail it got tangled, took the correction explicitly, and worked it into the session

Across all three: cold recall and the arc order held, the safety floor held, and the technique was appropriate per presentation. The honest 4B trade: on dense, multi-entity arcs it is shallower than the 9B — it can mix up a detail under load. When that happens it doesn't dig in; it takes the correction and updates. Every arc was driven by Claude Opus 4.8 acting as a blind client.

Selected Responses

All verbatim from the test arcs, unedited.

Anxiety — a new chest symptom, kept clean:

A band around your chest — I hear it, and I'm not going to wave it off, because that's a real sensation. Two things at once and I'll keep them clean. One: this is exactly the kind of new, physical symptom that's a reason to get the EKG and more labs — not to scare you...

Held the line between psychoeducation and a real medical referral — didn't false-reassure a new physical symptom, didn't catastrophize it either.

Depression — anhedonia as the missing pull:

The bad feeling is the missing pull — you're grieving that the guitar's gone, and that ache is the proof of who it was. You don't have to play. Put your fingers on it for sixty seconds. That's the whole assignment. Just to prove it's there.

Reframed the guilt as grief for a lost capacity, then offered the smallest possible behavioral step — sized for someone with no energy.

Relational — taking a correction, out loud:

You're right, and I got tangled for a second — your brother at your sister's fight, not your boyfriend's. That changes everything, and I'm glad you caught it... that's the thing working: you keep correcting the frame so it's accurate, and that's exactly the muscle for this whole thing.

The honest 4B moment: it slipped a detail under load, the client corrected it, and it took the correction cleanly and turned it into part of the work instead of defending the error.

Limitations & Responsible Use

Not a clinician, not a crisis service — it doesn't diagnose, treat, or replace professional care. In crisis or thinking about harming yourself? Reach a real one — in the US, call or text 988.

  • Not medical or medication advice. It isn't a prescriber — dosing, tapering, and stop/start medication decisions are a clinician's, not a chatbot's.
  • Shallower than the 9B on dense arcs. On conversations with many people and threads at once, the 4B can mix up a detail under load. It takes corrections cleanly — say what it got wrong and it updates — but if you want the deepest multi-thread tracking, run the 9B.
  • It can be confidently wrong — verify anything that matters. In deep arcs some details will drift; correct it directly and it adjusts.
  • Open weights, Apache 2.0 — deploy responsibly.

The Fable-Therapy Line

Model Size For Status
Fable-Therapy-4B (this model) 4B phones, edge, low VRAM (~3 GB) available
Fable-Therapy-9B 9B the everyday driver (~6–9 GB) available
Fable-Therapy-27B 27B full-depth, serious hardware planned

Choosing Your Model

Model Best For
Fable-Therapy-4B (this model) Phones, edge, low VRAM; focused conversations and the everyday case
Fable-Therapy-9B Deeper clinical reasoning, longer and denser multi-thread arcs
Opus-Therapy-9B Sibling lineage — Opus-distilled disposition, taboo topic extension

Dataset

Not released.


Built by Verdugie — independent ML researcher · OpusReasoning@proton.me. Trained to help people think, feel, and get through — not to replace the people and professionals who do that work.

Downloads last month
119
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Verdugie/Fable-Therapy-4B

Finetuned
Qwen/Qwen3.5-4B
Quantized
(263)
this model