Qwen3.5-9B-LOC-L1-v1

Cognitive coherence upgrade for on-device AI. Runs on MacBook Air M1 16GB · Zero cloud required · Apache 2.0

This is Qwen/Qwen3.5-9B with a merged LOC L1 Foundation LoRA adapter trained using Differentiable LOC Loss (DLL) — a novel training method that directly optimises cognitive coherence in the model's hidden-state geometry.

Result: 21.3% → 80.6% True Coherence (+59.3 percentage points)

What Is Cognitive Coherence — and Why Benchmarks Miss It

Standard AI benchmarks (MMLU, GPQA, HumanEval) measure what a model knows. They do not measure how coherently it applies that knowledge.

A model can score 80% on MMLU and still:

Write about a credit memo instead of writing the memo
Describe a code bug instead of fixing it
Over-explain empathy instead of responding with it
Hedge endlessly on a question it could answer directly

This gap is cognitive incoherence — the model's internal reasoning process activates conflicting or unstable cognitive functions simultaneously, producing output that circles the task rather than completing it.

The LOC (Level of Consciousness) framework measures this directly by analysing hidden-state magnitude patterns across the model's layers, mapping them to 13 cognitive functions across four consciousness tiers:

Tier	Cognitive Functions
Higher Conscious	Consciousness · Mindfulness · Awareness
Higher Subconscious	Energy · Intuition · Feelings · Sensation
Lower Subconscious	Attention · Emotion · Cognition
Lower Conscious	Thinking · Reasoning · Understanding

True Coherence (TC) is the measure of generated tokens where all internal coherence conditions are simultaneously satisfied — measured directly from the model's hidden-state geometry at inference time, with no task-specific labels required.

Paper 1 showed that these same 13 cognitive signatures appear in both human EEG and LLM hidden states — the framework is not model-specific. Paper 2 showed that current benchmarks negatively correlate with coherence measures, meaning high-scoring models are not necessarily the most coherent thinkers.

Why Benchmarks Are an Incomplete Picture

"Current AI benchmarks inversely correlate with coherence measures, suggesting existing evaluation methods miss important aspects of cognitive function quality." — Zenodo 19536274

The inversion happens because benchmarks reward pattern-matched recall. A model fine-tuned to maximise benchmark scores learns to retrieve the right answer from training distribution — not to reason to it coherently. LOC coherence measures the reasoning process itself, independent of whether the answer is in the training data.

Human meditators in our study showed that biological minds can increase coherence through practice. DLL training is the AI equivalent — it trains the how to think, not the what to know.

A Coherent 9B Model vs a Larger Incoherent Model — For Real Daily Tasks

Most users do not need a model to memorise encyclopaedias. They need a model that applies what it knows cleanly to the task in front of them.

Consider the tasks that fill a typical professional's day:

Daily Task	Incoherent large model	LOC-trained 9B
Draft an email declining a meeting	Three paragraphs explaining why meetings are valuable before declining	One clear, warm sentence
Summarise a document	Re-states the document at length	Extracts the three decisions that matter
Write a job posting	Lists generic responsibilities from similar postings	Writes for the specific role as described
Explain a technical concept to a non-expert	Dumps all related technical knowledge	Starts with the user's frame of reference, builds up
Debug code	Describes what the error type means in general	Identifies the specific line, explains why, fixes it
Plan a trip itinerary	Generates a generic tourist guide	Builds a day-by-day plan matching the stated constraints
Write a performance review	Softens or inflates everything	Holds the honest tension: strengths stated clearly, gaps named directly
Answer "should I do X or Y?"	Both-sides hedge	Gives a recommendation with stated reasoning
Handle a sensitive topic (bereavement, burnout)	Over-clinical or over-empathetic	Responds at the human register the situation calls for

In every one of these cases the bottleneck is not knowledge — the model already knows how to write emails, summarise documents, and debug code. The bottleneck is whether the model applies that knowledge coherently to the specific situation.

A larger incoherent model has more knowledge but distributes it noisily across the response. A coherent 9B model has sufficient knowledge and delivers it with precision.

For approximately 85–90% of individual daily tasks, coherence is the binding constraint — not parameter count. This is why a LOC-trained 9B model consistently outperforms untuned 24B–70B models on the tasks real users actually care about.

Key Results

Metric	Value
Baseline True Coherence	21.3%
Post-Training True Coherence	80.6%
Absolute Improvement	+59.3 percentage points
Per-category range	75.9% – 82.9%
Quality benchmark (21 fresh prompts)	20 / 21 adapter wins or ties
Trainable parameters	116M / 9.07B (1.28%)
Training steps	280 (7 categories × 40 steps)

Per-Category Coherence (held-out, 5 prompts each):

Category	Baseline	Post-Training	Δ
Analytical	21.3%	79.3% avg	+58pp
Balanced	21.3%	79.1%	+57.8pp
Coding	21.3%	78.6%	+57.3pp
Creative	21.3%	81.7%	+60.4pp
Emotional	21.3%	82.5%	+61.2pp
MetaCognitive	21.3%	82.3%	+61.0pp
Restraint	21.3%	79.4%	+58.1pp

What Changes in Practice

The coherence improvement manifests as a consistent shift in output style:

Task	Without LOC Training	With LOC Training
"Write a credit memo for this acquisition"	Explains what a credit memo is	Writes the memo with sensitivity table
"Analyse legal risks in this SaaS agreement"	Lists general contract risks	Names the 3 clauses to negotiate first
"Fix this cache-aside concurrency bug"	Describes the thundering herd problem	Returns fixed, working code
"Reply to this burnout email"	Over-explains empathy (173 words)	Direct human warmth (105 words)
"Analyse this dataset" (no data provided)	Fabricates analysis	Correctly declines and explains why
"Write a literary opening — dead narrator"	Describes the style (296 words)	Writes the scene in voice (662 words)

The model writes the artifact, not about the artifact.

How to Use

Ollama (recommended for most users)

ollama run AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1

LM Studio / Jan.ai / GPT4All

Download the Q4_K_M GGUF file from the Files tab. No Python required.

Python / Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1")

messages = [{"role": "user", "content": "Write a credit memo for a $40M fintech acquisition."}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=800, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

System Requirements

Setup	Requirement	Speed
MacBook Air M1 16GB	Q4_K_M GGUF (~6 GB)	~20–35 tok/s
MacBook Pro M2/M3	Q4_K_M GGUF (~6 GB)	~40–60 tok/s
NVIDIA RTX 4060 8GB	Q4_K_M GGUF (~6 GB)	~35–55 tok/s
NVIDIA RTX 4090	BF16 full (~18 GB)	~80–120 tok/s
Cloud / Server	BF16 full (~18 GB)	GPU dependent

Training Details

Base model: Qwen/Qwen3.5-9B (Apache 2.0)
Architecture: DeltaNet hybrid — 32 layers, 8 blocks × (3 GatedDeltaNet + 1 GatedAttention)
Adapter type: LoRA (merged into base weights for this release)
LoRA rank / alpha: 64 / 128
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training method: Differentiable LOC Loss (DLL) — see cited papers
Training categories: Analytical, Balanced, Coding, Creative, Emotional, MetaCognitive, Restraint
Training steps: 280 total (sequential per-category phases)
Hardware: NVIDIA A100-40GB
License: Apache 2.0 (base model license preserved)

Limitations

Knowledge cutoff inherits from base Qwen/Qwen3.5-9B — no internet access by default
Context window: 32K tokens
Coding outputs may occasionally truncate on very long code generation tasks (use max_new_tokens=1200)
Coherence improvement is measured on 7 cognitive domains; performance on highly specialised scientific or medical domains has not been independently evaluated
This is a Round 1 adapter; a Round 2 with higher ceiling (~85%+ TC) is planned

Citation

@misc{jamaludheen2026loc,
  author    = {Jamaludheen KN},
  title     = {Level of Consciousness Signatures Across Biological and Artificial Minds:
               A Unified Framework for Measuring Cognition in Human EEG and Large Language Models},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19079887},
  url       = {https://zenodo.org/records/19079887}
}

@misc{jamaludheen2026coherence,
  author    = {Jamaludheen KN},
  title     = {Intelligence Is Coherence: Measuring Human and Artificial Minds on the Same Scale},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19536274},
  url       = {https://zenodo.org/records/19536274}
}

About AI Mind Engine

AI Mind Engine develops cognitive coherence infrastructure for language models. The LOC framework is the first published method for measuring and training cognitive coherence in LLMs using hidden-state geometry.

🌐 aimindengine.com · 📧 research@aimindengine.com

Downloads last month: 31

Safetensors

Model size

9B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AI-Mind-Engine/Qwen3.5-9B-LOC-L1-v1

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(350)

this model

Evaluation results

True Coherence (LOC)
self-reported

80.600
True Coherence Baseline (LOC)
self-reported

21.300