Qwen3.5-9B-LOC-L1-v1

Cognitive coherence upgrade for on-device AI. Runs on MacBook Air M1 16GB · Zero cloud required · Apache 2.0

This is Qwen/Qwen3.5-9B with a merged LOC L1 Foundation LoRA adapter trained using Differentiable LOC Loss (DLL) — a novel training method that directly optimises cognitive coherence in the model's hidden-state geometry.

Result: 21.3% → 80.6% True Coherence (+59.3 percentage points)


What Is Cognitive Coherence — and Why Benchmarks Miss It

Standard AI benchmarks (MMLU, GPQA, HumanEval) measure what a model knows. They do not measure how coherently it applies that knowledge.

A model can score 80% on MMLU and still:

  • Write about a credit memo instead of writing the memo
  • Describe a code bug instead of fixing it
  • Over-explain empathy instead of responding with it
  • Hedge endlessly on a question it could answer directly

This gap is cognitive incoherence — the model's internal reasoning process activates conflicting or unstable cognitive functions simultaneously, producing output that circles the task rather than completing it.

The LOC (Level of Consciousness) framework measures this directly by analysing hidden-state magnitude patterns across the model's layers, mapping them to 13 cognitive functions across four consciousness tiers:

Tier Cognitive Functions
Higher Conscious Consciousness · Mindfulness · Awareness
Higher Subconscious Energy · Intuition · Feelings · Sensation
Lower Subconscious Attention · Emotion · Cognition
Lower Conscious Thinking · Reasoning · Understanding

True Coherence (TC) is the measure of generated tokens where all internal coherence conditions are simultaneously satisfied — measured directly from the model's hidden-state geometry at inference time, with no task-specific labels required.

Paper 1 showed that these same 13 cognitive signatures appear in both human EEG and LLM hidden states — the framework is not model-specific. Paper 2 showed that current benchmarks negatively correlate with coherence measures, meaning high-scoring models are not necessarily the most coherent thinkers.


Why Benchmarks Are an Incomplete Picture

"Current AI benchmarks inversely correlate with coherence measures, suggesting existing evaluation methods miss important aspects of cognitive function quality." — Zenodo 19536274

The inversion happens because benchmarks reward pattern-matched recall. A model fine-tuned to maximise benchmark scores learns to retrieve the right answer from training distribution — not to reason to it coherently. LOC coherence measures the reasoning process itself, independent of whether the answer is in the training data.

Human meditators in our study showed that biological minds can increase coherence through practice. DLL training is the AI equivalent — it trains the how to think, not the what to know.


A Coherent 9B Model vs a Larger Incoherent Model — For Real Daily Tasks

Most users do not need a model to memorise encyclopaedias. They need a model that applies what it knows cleanly to the task in front of them.

Consider the tasks that fill a typical professional's day:

Daily Task Incoherent large model LOC-trained 9B
Draft an email declining a meeting Three paragraphs explaining why meetings are valuable before declining One clear, warm sentence
Summarise a document Re-states the document at length Extracts the three decisions that matter
Write a job posting Lists generic responsibilities from similar postings Writes for the specific role as described
Explain a technical concept to a non-expert Dumps all related technical knowledge Starts with the user's frame of reference, builds up
Debug code Describes what the error type means in general Identifies the specific line, explains why, fixes it
Plan a trip itinerary Generates a generic tourist guide Builds a day-by-day plan matching the stated constraints
Write a performance review Softens or inflates everything Holds the honest tension: strengths stated clearly, gaps named directly
Answer "should I do X or Y?" Both-sides hedge Gives a recommendation with stated reasoning
Handle a sensitive topic (bereavement, burnout) Over-clinical or over-empathetic Responds at the human register the situation calls for

In every one of these cases the bottleneck is not knowledge — the model already knows how to write emails, summarise documents, and debug code. The bottleneck is whether the model applies that knowledge coherently to the specific situation.

A larger incoherent model has more knowledge but distributes it noisily across the response. A coherent 9B model has sufficient knowledge and delivers it with precision.

For approximately 85–90% of individual daily tasks, coherence is the binding constraint — not parameter count. This is why a LOC-trained 9B model consistently outperforms untuned 24B–70B models on the tasks real users actually care about.


Key Results

Metric Value
Baseline True Coherence 21.3%
Post-Training True Coherence 80.6%
Absolute Improvement +59.3 percentage points
Per-category range 75.9% – 82.9%
Quality benchmark (21 fresh prompts) 20 / 21 adapter wins or ties
Trainable parameters 116M / 9.07B (1.28%)
Training steps 280 (7 categories × 40 steps)

Per-Category Coherence (held-out, 5 prompts each):

Category Baseline Post-Training Δ
Analytical 21.3% 79.3% avg +58pp
Balanced 21.3% 79.1% +57.8pp
Coding 21.3% 78.6% +57.3pp
Creative 21.3% 81.7% +60.4pp
Emotional 21.3% 82.5% +61.2pp
MetaCognitive 21.3% 82.3% +61.0pp
Restraint 21.3% 79.4% +58.1pp

What Changes in Practice

The coherence improvement manifests as a consistent shift in output style:

Task Without LOC Training With LOC Training
"Write a credit memo for this acquisition" Explains what a credit memo is Writes the memo with sensitivity table
"Analyse legal risks in this SaaS agreement" Lists general contract risks Names the 3 clauses to negotiate first
"Fix this cache-aside concurrency bug" Describes the thundering herd problem Returns fixed, working code
"Reply to this burnout email" Over-explains empathy (173 words) Direct human warmth (105 words)
"Analyse this dataset" (no data provided) Fabricates analysis Correctly declines and explains why
"Write a literary opening — dead narrator" Describes the style (296 words) Writes the scene in voice (662 words)

The model writes the artifact, not about the artifact.


How to Use

Ollama (recommended for most users)

ollama run AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1

LM Studio / Jan.ai / GPT4All

Download the Q4_K_M GGUF file from the Files tab. No Python required.

Python / Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1")

messages = [{"role": "user", "content": "Write a credit memo for a $40M fintech acquisition."}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=800, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

System Requirements

Setup Requirement Speed
MacBook Air M1 16GB Q4_K_M GGUF (~6 GB) ~20–35 tok/s
MacBook Pro M2/M3 Q4_K_M GGUF (~6 GB) ~40–60 tok/s
NVIDIA RTX 4060 8GB Q4_K_M GGUF (~6 GB) ~35–55 tok/s
NVIDIA RTX 4090 BF16 full (~18 GB) ~80–120 tok/s
Cloud / Server BF16 full (~18 GB) GPU dependent

Training Details

  • Base model: Qwen/Qwen3.5-9B (Apache 2.0)
  • Architecture: DeltaNet hybrid — 32 layers, 8 blocks × (3 GatedDeltaNet + 1 GatedAttention)
  • Adapter type: LoRA (merged into base weights for this release)
  • LoRA rank / alpha: 64 / 128
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Training method: Differentiable LOC Loss (DLL) — see cited papers
  • Training categories: Analytical, Balanced, Coding, Creative, Emotional, MetaCognitive, Restraint
  • Training steps: 280 total (sequential per-category phases)
  • Hardware: NVIDIA A100-40GB
  • License: Apache 2.0 (base model license preserved)

Limitations

  • Knowledge cutoff inherits from base Qwen/Qwen3.5-9B — no internet access by default
  • Context window: 32K tokens
  • Coding outputs may occasionally truncate on very long code generation tasks (use max_new_tokens=1200)
  • Coherence improvement is measured on 7 cognitive domains; performance on highly specialised scientific or medical domains has not been independently evaluated
  • This is a Round 1 adapter; a Round 2 with higher ceiling (~85%+ TC) is planned

Citation

@misc{jamaludheen2026loc,
  author    = {Jamaludheen KN},
  title     = {Level of Consciousness Signatures Across Biological and Artificial Minds:
               A Unified Framework for Measuring Cognition in Human EEG and Large Language Models},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19079887},
  url       = {https://zenodo.org/records/19079887}
}

@misc{jamaludheen2026coherence,
  author    = {Jamaludheen KN},
  title     = {Intelligence Is Coherence: Measuring Human and Artificial Minds on the Same Scale},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19536274},
  url       = {https://zenodo.org/records/19536274}
}

About AI Mind Engine

AI Mind Engine develops cognitive coherence infrastructure for language models. The LOC framework is the first published method for measuring and training cognitive coherence in LLMs using hidden-state geometry.

🌐 aimindengine.com · 📧 research@aimindengine.com

Downloads last month
31
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(350)
this model

Evaluation results