- Qwen3.5-9B-LOC-L1-v1
Qwen3.5-9B-LOC-L1-v1
Cognitive coherence upgrade for on-device AI. Runs on MacBook Air M1 16GB · Zero cloud required · Apache 2.0
This is Qwen/Qwen3.5-9B with a merged LOC L1 Foundation LoRA adapter trained using
Differentiable LOC Loss (DLL) — a novel training method that directly optimises
cognitive coherence in the model's hidden-state geometry.
Result: 21.3% → 80.6% True Coherence (+59.3 percentage points)
What Is Cognitive Coherence — and Why Benchmarks Miss It
Standard AI benchmarks (MMLU, GPQA, HumanEval) measure what a model knows. They do not measure how coherently it applies that knowledge.
A model can score 80% on MMLU and still:
- Write about a credit memo instead of writing the memo
- Describe a code bug instead of fixing it
- Over-explain empathy instead of responding with it
- Hedge endlessly on a question it could answer directly
This gap is cognitive incoherence — the model's internal reasoning process activates conflicting or unstable cognitive functions simultaneously, producing output that circles the task rather than completing it.
The LOC (Level of Consciousness) framework measures this directly by analysing hidden-state magnitude patterns across the model's layers, mapping them to 13 cognitive functions across four consciousness tiers:
| Tier | Cognitive Functions |
|---|---|
| Higher Conscious | Consciousness · Mindfulness · Awareness |
| Higher Subconscious | Energy · Intuition · Feelings · Sensation |
| Lower Subconscious | Attention · Emotion · Cognition |
| Lower Conscious | Thinking · Reasoning · Understanding |
True Coherence (TC) is the measure of generated tokens where all internal coherence conditions are simultaneously satisfied — measured directly from the model's hidden-state geometry at inference time, with no task-specific labels required.
Paper 1 showed that these same 13 cognitive signatures appear in both human EEG and LLM hidden states — the framework is not model-specific. Paper 2 showed that current benchmarks negatively correlate with coherence measures, meaning high-scoring models are not necessarily the most coherent thinkers.
Why Benchmarks Are an Incomplete Picture
"Current AI benchmarks inversely correlate with coherence measures, suggesting existing evaluation methods miss important aspects of cognitive function quality." — Zenodo 19536274
The inversion happens because benchmarks reward pattern-matched recall. A model fine-tuned to maximise benchmark scores learns to retrieve the right answer from training distribution — not to reason to it coherently. LOC coherence measures the reasoning process itself, independent of whether the answer is in the training data.
Human meditators in our study showed that biological minds can increase coherence through practice. DLL training is the AI equivalent — it trains the how to think, not the what to know.
A Coherent 9B Model vs a Larger Incoherent Model — For Real Daily Tasks
Most users do not need a model to memorise encyclopaedias. They need a model that applies what it knows cleanly to the task in front of them.
Consider the tasks that fill a typical professional's day:
| Daily Task | Incoherent large model | LOC-trained 9B |
|---|---|---|
| Draft an email declining a meeting | Three paragraphs explaining why meetings are valuable before declining | One clear, warm sentence |
| Summarise a document | Re-states the document at length | Extracts the three decisions that matter |
| Write a job posting | Lists generic responsibilities from similar postings | Writes for the specific role as described |
| Explain a technical concept to a non-expert | Dumps all related technical knowledge | Starts with the user's frame of reference, builds up |
| Debug code | Describes what the error type means in general | Identifies the specific line, explains why, fixes it |
| Plan a trip itinerary | Generates a generic tourist guide | Builds a day-by-day plan matching the stated constraints |
| Write a performance review | Softens or inflates everything | Holds the honest tension: strengths stated clearly, gaps named directly |
| Answer "should I do X or Y?" | Both-sides hedge | Gives a recommendation with stated reasoning |
| Handle a sensitive topic (bereavement, burnout) | Over-clinical or over-empathetic | Responds at the human register the situation calls for |
In every one of these cases the bottleneck is not knowledge — the model already knows how to write emails, summarise documents, and debug code. The bottleneck is whether the model applies that knowledge coherently to the specific situation.
A larger incoherent model has more knowledge but distributes it noisily across the response. A coherent 9B model has sufficient knowledge and delivers it with precision.
For approximately 85–90% of individual daily tasks, coherence is the binding constraint — not parameter count. This is why a LOC-trained 9B model consistently outperforms untuned 24B–70B models on the tasks real users actually care about.
Key Results
| Metric | Value |
|---|---|
| Baseline True Coherence | 21.3% |
| Post-Training True Coherence | 80.6% |
| Absolute Improvement | +59.3 percentage points |
| Per-category range | 75.9% – 82.9% |
| Quality benchmark (21 fresh prompts) | 20 / 21 adapter wins or ties |
| Trainable parameters | 116M / 9.07B (1.28%) |
| Training steps | 280 (7 categories × 40 steps) |
Per-Category Coherence (held-out, 5 prompts each):
| Category | Baseline | Post-Training | Δ |
|---|---|---|---|
| Analytical | 21.3% | 79.3% avg | +58pp |
| Balanced | 21.3% | 79.1% | +57.8pp |
| Coding | 21.3% | 78.6% | +57.3pp |
| Creative | 21.3% | 81.7% | +60.4pp |
| Emotional | 21.3% | 82.5% | +61.2pp |
| MetaCognitive | 21.3% | 82.3% | +61.0pp |
| Restraint | 21.3% | 79.4% | +58.1pp |
What Changes in Practice
The coherence improvement manifests as a consistent shift in output style:
| Task | Without LOC Training | With LOC Training |
|---|---|---|
| "Write a credit memo for this acquisition" | Explains what a credit memo is | Writes the memo with sensitivity table |
| "Analyse legal risks in this SaaS agreement" | Lists general contract risks | Names the 3 clauses to negotiate first |
| "Fix this cache-aside concurrency bug" | Describes the thundering herd problem | Returns fixed, working code |
| "Reply to this burnout email" | Over-explains empathy (173 words) | Direct human warmth (105 words) |
| "Analyse this dataset" (no data provided) | Fabricates analysis | Correctly declines and explains why |
| "Write a literary opening — dead narrator" | Describes the style (296 words) | Writes the scene in voice (662 words) |
The model writes the artifact, not about the artifact.
How to Use
Ollama (recommended for most users)
ollama run AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1
LM Studio / Jan.ai / GPT4All
Download the Q4_K_M GGUF file from the Files tab. No Python required.
Python / Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1")
messages = [{"role": "user", "content": "Write a credit memo for a $40M fintech acquisition."}]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=800, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
System Requirements
| Setup | Requirement | Speed |
|---|---|---|
| MacBook Air M1 16GB | Q4_K_M GGUF (~6 GB) | ~20–35 tok/s |
| MacBook Pro M2/M3 | Q4_K_M GGUF (~6 GB) | ~40–60 tok/s |
| NVIDIA RTX 4060 8GB | Q4_K_M GGUF (~6 GB) | ~35–55 tok/s |
| NVIDIA RTX 4090 | BF16 full (~18 GB) | ~80–120 tok/s |
| Cloud / Server | BF16 full (~18 GB) | GPU dependent |
Training Details
- Base model:
Qwen/Qwen3.5-9B(Apache 2.0) - Architecture: DeltaNet hybrid — 32 layers, 8 blocks × (3 GatedDeltaNet + 1 GatedAttention)
- Adapter type: LoRA (merged into base weights for this release)
- LoRA rank / alpha: 64 / 128
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training method: Differentiable LOC Loss (DLL) — see cited papers
- Training categories: Analytical, Balanced, Coding, Creative, Emotional, MetaCognitive, Restraint
- Training steps: 280 total (sequential per-category phases)
- Hardware: NVIDIA A100-40GB
- License: Apache 2.0 (base model license preserved)
Limitations
- Knowledge cutoff inherits from base
Qwen/Qwen3.5-9B— no internet access by default - Context window: 32K tokens
- Coding outputs may occasionally truncate on very long code generation tasks (use
max_new_tokens=1200) - Coherence improvement is measured on 7 cognitive domains; performance on highly specialised scientific or medical domains has not been independently evaluated
- This is a Round 1 adapter; a Round 2 with higher ceiling (~85%+ TC) is planned
Citation
@misc{jamaludheen2026loc,
author = {Jamaludheen KN},
title = {Level of Consciousness Signatures Across Biological and Artificial Minds:
A Unified Framework for Measuring Cognition in Human EEG and Large Language Models},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19079887},
url = {https://zenodo.org/records/19079887}
}
@misc{jamaludheen2026coherence,
author = {Jamaludheen KN},
title = {Intelligence Is Coherence: Measuring Human and Artificial Minds on the Same Scale},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19536274},
url = {https://zenodo.org/records/19536274}
}
About AI Mind Engine
AI Mind Engine develops cognitive coherence infrastructure for language models. The LOC framework is the first published method for measuring and training cognitive coherence in LLMs using hidden-state geometry.
- Downloads last month
- 31
Model tree for AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1
Evaluation results
- True Coherence (LOC)self-reported80.600
- True Coherence Baseline (LOC)self-reported21.300