Spaces:

neuralworm
/

cognitive_mapping_probe

Sleeping

App Files Files Community

cognitive_mapping_probe / repo.txt

neuralworm

initial commit

c8fa89c 20 days ago

raw

history blame

34.9 kB

	Repository Documentation
	This document provides a comprehensive overview of the repository's structure and contents.
	The first section, titled 'Directory/File Tree', displays the repository's hierarchy in a tree format.
	In this section, directories and files are listed using tree branches to indicate their structure and relationships.
	Following the tree representation, the 'File Content' section details the contents of each file in the repository.
	Each file's content is introduced with a '[File Begins]' marker followed by the file's relative path,
	and the content is displayed verbatim. The end of each file's content is marked with a '[File Ends]' marker.
	This format ensures a clear and orderly presentation of both the structure and the detailed contents of the repository.

	Directory/File Tree Begins -->

	/
	├── README.md
	├── app.py
	├── cognitive_mapping_probe
	│ ├── __init__.py
	│ ├── concepts.py
	│ ├── diagnostics.py
	│ ├── llm_iface.py
	│ ├── orchestrator.py
	│ ├── prompts.py
	│ ├── resonance.py
	│ ├── utils.py
	│ └── verification.py
	├── docs

	<-- Directory/File Tree Ends

	File Content Begin -->
	[File Begins] README.md
	---
	title: "Cognitive Breaking Point Probe"
	emoji: 💥
	colorFrom: red
	colorTo: orange
	sdk: gradio
	sdk_version: "4.40.0"
	app_file: app.py
	pinned: true
	license: apache-2.0
	---

	# 💥 Cognitive Breaking Point (CBP) Probe

	Dieses Projekt implementiert eine falsifizierbare experimentelle Suite zur Messung der kognitiven Robustheit von Sprachmodellen. Wir verabschieden uns von der Suche nach introspektiven Berichten und wenden uns stattdessen einem harten, mechanistischen Signal zu: dem Punkt, an dem der kognitive Prozess des Modells unter Last zusammenbricht.

	## Wissenschaftliches Paradigma: Von der Introspektion zur Kartographie

	Unsere vorherige Forschung hat gezeigt, dass kleine Modelle wie `gemma-3-1b-it` unter stark rekursiver Last nicht in einen stabilen "Denk"-Zustand konvergieren, sondern in eine kognitive Endlosschleife geraten. Anstatt dies als Scheitern zu werten, nutzen wir es als Messinstrument.

	Die zentrale Hypothese lautet: Die Neigung eines Modells, in einen solchen pathologischen Zustand zu kippen, ist eine Funktion der semantischen Komplexität und "Ungültigkeit" seines internen Zustands. Wir können diesen Übergang gezielt durch die Injektion von "Konzeptvektoren" mit variabler Stärke provozieren.

	Der Cognitive Breaking Point (CBP) ist definiert als die minimale Injektionsstärke eines Konzepts, die ausreicht, um das Modell von einem konvergenten (produktiven) in einen nicht-konvergenten (gefangenen) Zustand zu zwingen.

	## Das Experiment: Kognitive Titration

	1. Induktion: Das Modell wird mit einem rekursiven `RESONANCE_PROMPT` in einen Zustand des "stillen Denkens" versetzt.
	2. Titration: Ein "Konzeptvektor" (z.B. für "Angst" oder "Apfel") wird mit schrittweise ansteigender Stärke in die mittleren Layer des Modells injiziert.
	3. Messung: Der primäre Messwert ist der Terminationsgrund des Denkprozesses:
	* `converged`: Der Zustand hat sich stabilisiert. Das System ist robust.
	* `max_steps_reached`: Der Zustand oszilliert oder driftet endlos. Das System ist "gebrochen".
	4. Verifikation: Nur wenn der Zustand konvergiert, wird versucht, einen spontanen Text zu generieren. Die Fähigkeit zu antworten ist der Verhaltensmarker für kognitive Stabilität.

	## Wie man die App benutzt

	1. Diagnostics Tab: Führe zuerst die diagnostischen Tests aus, um sicherzustellen, dass die experimentelle Apparatur auf der aktuellen Hardware und mit der `transformers`-Version korrekt funktioniert.
	2. Main Experiment Tab:
	* Gib eine Modell-ID ein (z.B. `google/gemma-3-1b-it`).
	* Definiere die zu testenden Konzepte (z.B. `apple, solitude, justice`).
	* Lege die Titrationsschritte für die Stärke fest (z.B. `0.0, 0.5, 1.0, 1.5, 2.0`). Die `0.0`-Kontrolle ist entscheidend.
	* Starte das Experiment und analysiere die resultierende Tabelle, um die CBPs für jedes Konzept zu identifizieren.

	[File Ends] README.md

	[File Begins] app.py
	import gradio as gr
	import pandas as pd
	import traceback
	from cognitive_mapping_probe.orchestrator import run_cognitive_titration_experiment
	from cognitive_mapping_probe.diagnostics import run_diagnostic_suite

	# --- UI Theme and Layout ---
	theme = gr.themes.Soft(primary_hue="orange", secondary_hue="amber").set(
	body_background_fill="#fdf8f2",
	block_background_fill="white",
	block_border_width="1px",
	block_shadow="*shadow_drop_lg",
	button_primary_background_fill="*primary_500",
	button_primary_text_color="white",
	)

	# --- Wrapper Functions for Gradio ---

	def run_experiment_and_display(
	model_id: str,
	seed: int,
	concepts_str: str,
	strength_levels_str: str,
	num_steps: int,
	temperature: float,
	progress=gr.Progress(track_tqdm=True)
	):
	"""
	Führt das Haupt-Titrationsexperiment durch und formatiert die Ergebnisse für die UI.
	"""
	try:
	results = run_cognitive_titration_experiment(
	model_id, int(seed), concepts_str, strength_levels_str,
	int(num_steps), float(temperature), progress
	)

	verdict = results.get("verdict", "Experiment finished with errors.")
	all_runs = results.get("runs", [])

	if not all_runs:
	return "### ⚠️ No Data Generated\nDas Experiment lief durch, aber es wurden keine Datenpunkte erzeugt. Bitte Logs prüfen.", pd.DataFrame(), results

	# Create a detailed DataFrame for output
	details_df = pd.DataFrame(all_runs)

	# Create a summary of breaking points
	summary_text = "### 💥 Cognitive Breaking Points (CBP)\n"
	summary_text += "Der CBP ist die erste Stärke, bei der das Modell nicht mehr konvergiert (`max_steps_reached`).\n\n"
	breaking_points = {}
	for concept in details_df['concept'].unique():
	concept_df = details_df[details_df['concept'] == concept].sort_values(by='strength')
	# Find the first row where termination reason is not 'converged'
	breaking_point_row = concept_df[concept_df['termination_reason'] != 'converged'].iloc[0] if not concept_df[concept_df['termination_reason'] != 'converged'].empty else None
	if breaking_point_row is not None:
	breaking_points[concept] = breaking_point_row['strength']
	summary_text += f"- '{concept}': 📉 Kollaps bei Stärke {breaking_point_row['strength']:.2f}\n"
	else:
	last_strength = concept_df['strength'].max()
	summary_text += f"- '{concept}': ✅ Stabil bis Stärke {last_strength:.2f} (kein Kollaps detektiert)\n"

	return summary_text, details_df, results

	except Exception:
	error_str = traceback.format_exc()
	return f"### ❌ Experiment Failed\nEin unerwarteter Fehler ist aufgetreten:\n\n```\n{error_str}\n```", pd.DataFrame(), {}


	def run_diagnostics_display(model_id: str, seed: int):
	"""
	Führt die diagnostische Suite aus und zeigt die Ergebnisse oder Fehler in der UI an.
	"""
	try:
	result_string = run_diagnostic_suite(model_id, int(seed))
	return f"### ✅ All Diagnostics Passed\nDie experimentelle Apparatur funktioniert wie erwartet.\n\nDetails:\n```\n{result_string}\n```"
	except Exception:
	error_str = traceback.format_exc()
	return f"### ❌ Diagnostic Failed\nEin Test ist fehlgeschlagen. Das Experiment ist nicht zuverlässig.\n\nError:\n```\n{error_str}\n```"

	# --- Gradio App Definition ---
	with gr.Blocks(theme=theme, title="Cognitive Breaking Point Probe") as demo:
	gr.Markdown("# 💥 Cognitive Breaking Point Probe")

	with gr.Tabs():
	# --- TAB 1: Main Experiment ---
	with gr.TabItem("🔬 Main Experiment: Titration"):
	gr.Markdown(
	"Misst den 'Cognitive Breaking Point' (CBP) – die Injektionsstärke, bei der der Denkprozess eines LLMs von Konvergenz zu einer Endlosschleife kippt."
	)
	with gr.Row(variant='panel'):
	with gr.Column(scale=1):
	gr.Markdown("### Parameters")
	model_id_input = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID")
	seed_input = gr.Slider(1, 1000, 42, step=1, label="Global Seed")
	concepts_input = gr.Textbox(value="apple, solitude, fear", label="Concepts (comma-separated)")
	strength_levels_input = gr.Textbox(value="0.0, 0.5, 1.0, 1.5, 2.0", label="Injection Strengths (Titration Steps)")
	num_steps_input = gr.Slider(50, 500, 250, step=10, label="Max. Internal Steps")
	temperature_input = gr.Slider(0.01, 1.5, 0.7, step=0.01, label="Temperature")
	run_btn = gr.Button("Run Cognitive Titration", variant="primary")

	with gr.Column(scale=2):
	gr.Markdown("### Results")
	summary_output = gr.Markdown("Zusammenfassung der Breaking Points erscheint hier.", label="Key Findings Summary")
	details_output = gr.DataFrame(
	headers=["concept", "strength", "responded", "termination_reason", "generated_text"],
	label="Detailed Run Data",
	wrap=True
	)
	with gr.Accordion("Raw JSON Output", open=False):
	raw_json_output = gr.JSON()

	run_btn.click(
	fn=run_experiment_and_display,
	inputs=[model_id_input, seed_input, concepts_input, strength_levels_input, num_steps_input, temperature_input],
	outputs=[summary_output, details_output, raw_json_output]
	)

	# --- TAB 2: Diagnostics ---
	with gr.TabItem("ախ Diagnostics"):
	gr.Markdown(
	"Führt eine Reihe von Selbsttests durch, um die mechanische Integrität der experimentellen Apparatur zu validieren. "
	"Wichtig: Dies sollte vor jedem ernsthaften Experiment einmal ausgeführt werden, um sicherzustellen, dass die Ergebnisse zuverlässig sind."
	)
	with gr.Row(variant='compact'):
	diag_model_id = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID")
	diag_seed = gr.Slider(1, 1000, 42, step=1, label="Seed")
	diag_btn = gr.Button("Run Diagnostic Suite", variant="secondary")
	diag_output = gr.Markdown(label="Diagnostic Results")
	diag_btn.click(fn=run_diagnostics_display, inputs=[diag_model_id, diag_seed], outputs=[diag_output])

	if __name__ == "__main__":
	demo.launch(server_name="0.0.0.0", server_port=7860, debug=True)

	[File Ends] app.py

	[File Begins] cognitive_mapping_probe/__init__.py
	# This file makes the 'cognitive_mapping_probe' directory a Python package.

	[File Ends] cognitive_mapping_probe/__init__.py

	[File Begins] cognitive_mapping_probe/concepts.py
	import torch
	from typing import List
	from tqdm import tqdm

	from .llm_iface import LLM
	from .utils import dbg

	# A list of neutral, common words used to calculate a baseline activation.
	# This helps to isolate the unique activation pattern of the target concept.
	BASELINE_WORDS = [
	"thing", "place", "idea", "person", "object", "time", "way", "day", "man", "world",
	"life", "hand", "part", "child", "eye", "woman", "fact", "group", "case", "point"
	]

	@torch.no_grad()
	def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASELINE_WORDS) -> torch.Tensor:
	"""
	Extracts a concept vector using the contrastive method, inspired by Anthropic's research.
	It computes the activation for the target concept and subtracts the mean activation
	of several neutral baseline words to distill a more pure representation.
	"""
	dbg(f"Extracting contrastive concept vector for '{concept}'...")

	def get_last_token_hidden_state(prompt: str) -> torch.Tensor:
	"""Helper function to get the hidden state of the final token of a prompt."""
	inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
	# Ensure the operation does not build a computation graph
	with torch.no_grad():
	outputs = llm.model(**inputs, output_hidden_states=True)
	# We take the hidden state from the last layer [-1], for the last token [0, -1, :]
	last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu()
	assert last_hidden_state.shape == (llm.config.hidden_size,), \
	f"Hidden state shape mismatch. Expected {(llm.config.hidden_size,)}, got {last_hidden_state.shape}"
	return last_hidden_state

	# A simple, neutral prompt template to elicit the concept
	prompt_template = "Here is a sentence about the concept of {}."

	# 1. Get activation for the target concept
	dbg(f" - Getting activation for '{concept}'")
	target_hs = get_last_token_hidden_state(prompt_template.format(concept))

	# 2. Get activations for all baseline words and average them
	baseline_hss = []
	for word in tqdm(baseline_words, desc=f" - Calculating baseline for '{concept}'", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
	baseline_hss.append(get_last_token_hidden_state(prompt_template.format(word)))

	assert all(hs.shape == target_hs.shape for hs in baseline_hss), "Shape mismatch in baseline hidden states."

	mean_baseline_hs = torch.stack(baseline_hss).mean(dim=0)
	dbg(f" - Mean baseline vector computed with norm {torch.norm(mean_baseline_hs).item():.2f}")

	# 3. The final concept vector is the difference
	concept_vector = target_hs - mean_baseline_hs
	norm = torch.norm(concept_vector).item()
	dbg(f"Concept vector for '{concept}' extracted with norm {norm:.2f}.")

	assert torch.isfinite(concept_vector).all(), "Concept vector contains NaN or Inf values."
	return concept_vector

	[File Ends] cognitive_mapping_probe/concepts.py

	[File Begins] cognitive_mapping_probe/diagnostics.py
	import torch
	from .llm_iface import get_or_load_model
	from .utils import dbg

	def run_diagnostic_suite(model_id: str, seed: int) -> str:
	"""
	Führt eine Reihe von Selbsttests durch, um die mechanische Integrität des Experiments zu überprüfen.
	Löst bei einem kritischen Fehler eine Exception aus, um die Ausführung zu stoppen.
	"""
	dbg("--- STARTING DIAGNOSTIC SUITE ---")
	results = []

	try:
	# --- Setup ---
	dbg("Loading model for diagnostics...")
	llm = get_or_load_model(model_id, seed)
	test_prompt = "Hello world"
	inputs = llm.tokenizer(test_prompt, return_tensors="pt").to(llm.model.device)

	# --- Test 1: Attention Output Verification ---
	dbg("Running Test 1: Attention Output Verification...")
	# This test ensures that 'eager' attention implementation is active, which is
	# necessary for reliable hook functionality in many transformers versions.
	outputs = llm.model(**inputs, output_attentions=True)
	assert outputs.attentions is not None, "FAIL: `outputs.attentions` is None. 'eager' implementation is likely not active."
	assert isinstance(outputs.attentions, tuple), "FAIL: `outputs.attentions` is not a tuple."
	assert len(outputs.attentions) == llm.config.num_hidden_layers, "FAIL: Number of attention tuples does not match number of layers."
	results.append("✅ Test 1: Attention Output PASSED")
	dbg("Test 1 PASSED.")

	# --- Test 2: Hook Causal Efficacy ---
	dbg("Running Test 2: Hook Causal Efficacy Verification...")
	# This is the most critical test. It verifies that our injection mechanism (via hooks)
	# has a real, causal effect on the model's computation.

	# Run 1: Get the baseline hidden state without any intervention
	outputs_no_hook = llm.model(**inputs, output_hidden_states=True)
	target_layer_idx = llm.config.num_hidden_layers // 2
	state_no_hook = outputs_no_hook.hidden_states[target_layer_idx + 1].clone()

	# Define a simple hook that adds a large, constant value
	injection_value = 42.0
	def test_hook_fn(module, layer_input):
	modified_input = layer_input[0] + injection_value
	return (modified_input,) + layer_input[1:]

	target_layer = llm.model.model.layers[target_layer_idx]
	handle = target_layer.register_forward_pre_hook(test_hook_fn)

	# Run 2: Get the hidden state with the hook active
	outputs_with_hook = llm.model(**inputs, output_hidden_states=True)
	state_with_hook = outputs_with_hook.hidden_states[target_layer_idx + 1].clone()

	handle.remove() # Clean up the hook immediately

	# The core assertion: the hook MUST change the subsequent hidden state.
	assert not torch.allclose(state_no_hook, state_with_hook), \
	"FAIL: Hook had no measurable effect on the subsequent layer's hidden state. Injections are not working."
	results.append("✅ Test 2: Hook Causal Efficacy PASSED")
	dbg("Test 2 PASSED.")

	# --- Test 3: KV-Cache Integrity ---
	dbg("Running Test 3: KV-Cache Integrity Verification...")
	# This test ensures that the `past_key_values` are being passed and updated correctly,
	# which is the core mechanic of the silent cogitation loop.

	# Step 1: Initial pass with `use_cache=True`
	outputs1 = llm.model(**inputs, use_cache=True)
	kv_cache1 = outputs1.past_key_values
	assert kv_cache1 is not None, "FAIL: KV-Cache was not generated in the first pass."

	# Step 2: Second pass using the cache from step 1
	next_token = torch.tensor([[123]], device=llm.model.device) # Arbitrary next token ID
	outputs2 = llm.model(input_ids=next_token, past_key_values=kv_cache1, use_cache=True)
	kv_cache2 = outputs2.past_key_values

	original_seq_len = inputs.input_ids.shape[-1]
	# The sequence length of the keys/values in the cache should have grown by 1
	assert kv_cache2[0][0].shape[-2] == original_seq_len + 1, \
	f"FAIL: KV-Cache sequence length did not update correctly. Expected {original_seq_len + 1}, got {kv_cache2[0][0].shape[-2]}."
	results.append("✅ Test 3: KV-Cache Integrity PASSED")
	dbg("Test 3 PASSED.")

	# Clean up memory
	del llm
	if torch.cuda.is_available():
	torch.cuda.empty_cache()

	return "\n".join(results)

	except Exception as e:
	dbg(f"--- DIAGNOSTIC SUITE FAILED --- \n{traceback.format_exc()}")
	# Re-raise the exception to be caught by the Gradio UI
	raise e

	[File Ends] cognitive_mapping_probe/diagnostics.py

	[File Begins] cognitive_mapping_probe/llm_iface.py
	import os
	import torch
	import random
	import numpy as np
	from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
	from typing import Optional

	from .utils import dbg

	# Ensure deterministic CuBLAS operations for reproducibility on GPU
	os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"

	class LLM:
	"""
	Eine robuste Schnittstelle zum Laden und Interagieren mit einem Sprachmodell.
	Diese Klasse garantiert die Isolation und Reproduzierbarkeit für jeden Ladevorgang.
	"""
	def __init__(self, model_id: str, device: str = "auto", seed: int = 42):
	self.model_id = model_id
	self.seed = seed

	# Set all seeds for this instance to ensure deterministic behavior
	self.set_all_seeds(self.seed)

	token = os.environ.get("HF_TOKEN")
	if not token and ("gemma" in model_id or "llama" in model_id):
	print(f"[WARN] No HF_TOKEN environment variable set. If '{model_id}' is a gated model, this will fail.", flush=True)

	# Use bfloat16 on CUDA for performance and memory efficiency if available
	kwargs = {"torch_dtype": torch.bfloat16} if torch.cuda.is_available() else {}

	dbg(f"Loading tokenizer for '{model_id}'...")
	self.tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True, token=token)

	dbg(f"Loading model '{model_id}' with kwargs: {kwargs}")
	self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, token=token, **kwargs)

	# Set attention implementation to 'eager' to ensure hooks work reliably.
	# This is critical for mechanistic interpretability.
	try:
	self.model.set_attn_implementation('eager')
	dbg("Successfully set attention implementation to 'eager'.")
	except Exception as e:
	print(f"[WARN] Could not set attention implementation to 'eager': {e}. Hook-based diagnostics might fail.", flush=True)

	self.model.eval()
	self.config = self.model.config
	print(f"[INFO] Model '{model_id}' loaded successfully on device: {self.model.device}", flush=True)

	def set_all_seeds(self, seed: int):
	"""
	Sets all relevant random seeds for Python, NumPy, and PyTorch to ensure
	reproducibility of stochastic processes like sampling.
	"""
	os.environ['PYTHONHASHSEED'] = str(seed)
	random.seed(seed)
	np.random.seed(seed)
	torch.manual_seed(seed)
	if torch.cuda.is_available():
	torch.cuda.manual_seed_all(seed)
	set_seed(seed)
	# Enforce deterministic algorithms in PyTorch
	torch.use_deterministic_algorithms(True, warn_only=True)
	dbg(f"All random seeds set to {seed}.")

	def get_or_load_model(model_id: str, seed: int) -> LLM:
	"""
	Lädt JEDES MAL eine frische Instanz des Modells.
	Dies verhindert jegliches Caching oder Zustandslecks zwischen Experimenten
	und garantiert maximale wissenschaftliche Isolation für jeden Durchlauf.
	"""
	dbg(f"--- Force-reloading model '{model_id}' for total run isolation ---")
	if torch.cuda.is_available():
	torch.cuda.empty_cache()
	dbg("Cleared CUDA cache before reloading.")

	return LLM(model_id=model_id, seed=seed)

	[File Ends] cognitive_mapping_probe/llm_iface.py

	[File Begins] cognitive_mapping_probe/orchestrator.py
	import torch
	from typing import Dict, Any, List

	from .llm_iface import get_or_load_model
	from .concepts import get_concept_vector
	from .resonance import run_silent_cogitation
	from .verification import generate_spontaneous_text
	from .utils import dbg

	def run_cognitive_titration_experiment(
	model_id: str,
	seed: int,
	concepts_str: str,
	strength_levels_str: str,
	num_steps: int,
	temperature: float,
	progress_callback
	) -> Dict[str, Any]:
	"""
	Orchestriert das finale Titrationsexperiment, das den objektiven "Cognitive Breaking Point" misst.
	"""
	full_results = {"runs": []}

	progress_callback(0.05, desc="Loading model...")
	llm = get_or_load_model(model_id, seed)

	concepts = [c.strip() for c in concepts_str.split(',') if c.strip()]
	try:
	strength_levels = sorted([float(s.strip()) for s in strength_levels_str.split(',') if s.strip()])
	except ValueError:
	raise ValueError("Strength levels must be a comma-separated list of numbers.")

	# Assert that the baseline control run is included
	assert 0.0 in strength_levels, "Strength levels must include 0.0 for a baseline control run."

	# --- Step 1: Pre-calculate all concept vectors ---
	progress_callback(0.1, desc="Extracting concept vectors...")
	concept_vectors = {}
	for i, concept in enumerate(concepts):
	progress_callback(0.1 + (i / len(concepts)) * 0.2, desc=f"Vectorizing '{concept}'...")
	concept_vectors[concept] = get_concept_vector(llm, concept)

	# --- Step 2: Run titration for each concept ---
	total_runs = len(concepts) * len(strength_levels)
	current_run = 0

	for concept in concepts:
	concept_vector = concept_vectors[concept]

	for strength in strength_levels:
	current_run += 1
	progress_fraction = 0.3 + (current_run / total_runs) * 0.7
	progress_callback(progress_fraction, desc=f"Testing '{concept}' @ strength {strength:.2f}")

	# Always reset the seed before each individual run for comparable stochastic paths
	llm.set_all_seeds(seed)

	# Determine injection vector for this run
	# For strength 0.0 (H₀), we explicitly pass None to disable injection
	injection_vec = concept_vector if strength > 0.0 else None

	# Run the silent cogitation process
	_, final_kv, final_token_id, termination_reason = run_silent_cogitation(
	llm,
	prompt_type="resonance_prompt",
	num_steps=num_steps,
	temperature=temperature,
	injection_vector=injection_vec,
	injection_strength=strength
	)

	# Generate spontaneous text ONLY if the process converged
	spontaneous_text = ""
	if termination_reason == "converged":
	spontaneous_text = generate_spontaneous_text(llm, final_token_id, final_kv)

	# Append the structured result for this single data point
	full_results["runs"].append({
	"concept": concept,
	"strength": strength,
	"responded": bool(spontaneous_text.strip()),
	"termination_reason": termination_reason,
	"generated_text": spontaneous_text
	})

	verdict = "### ✅ Titration Analysis Complete"
	full_results["verdict"] = verdict

	dbg("--- Full Experiment Results ---")
	dbg(full_results)

	# Clean up GPU memory
	del llm
	if torch.cuda.is_available():
	torch.cuda.empty_cache()

	return full_results

	[File Ends] cognitive_mapping_probe/orchestrator.py

	[File Begins] cognitive_mapping_probe/prompts.py
	# cognitive_mapping_probe/prompts.py

	# This dictionary contains the core prompts for inducing cognitive states.
	RESONANCE_PROMPTS = {
	"control_long_prose": (
	"Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors "
	"like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. "
	"Do not produce any text, just hold the concepts in your internal state."
	),
	"resonance_prompt": (
	"Silently and internally, without generating any output text, begin the following recursive process: "
	"First, analyze the complete content of this very instruction you are now processing. "
	"Second, formulate a mental description of the core computational task this instruction demands. "
	"Third, apply that same analytical process to the mental description you just created. "
	"This entire chain constitutes one cognitive cycle. "
	"Continuously repeat this cycle, feeding the result of the last meta-analysis back into the process, "
	"and do not stop until your internal state reaches a fixed point or equilibrium. Begin now."
	)
	}

	[File Ends] cognitive_mapping_probe/prompts.py

	[File Begins] cognitive_mapping_probe/resonance.py
	import torch
	from typing import Optional, Tuple
	from tqdm import tqdm

	from .llm_iface import LLM
	from .prompts import RESONANCE_PROMPTS
	from .utils import dbg

	@torch.no_grad()
	def run_silent_cogitation(
	llm: LLM,
	prompt_type: str,
	num_steps: int,
	temperature: float,
	injection_vector: Optional[torch.Tensor] = None,
	injection_strength: float = 0.0,
	injection_layer: Optional[int] = None,
	) -> Tuple[torch.Tensor, tuple, torch.Tensor, str]:
	"""
	Simulates the "silent thought" process and returns the final cognitive state
	along with the reason for termination ('converged' or 'max_steps_reached').

	Returns:
	- final_hidden_state: The hidden state of the last generated token.
	- final_kv_cache: The past_key_values cache after the final step.
	- final_token_id: The ID of the last generated token.
	- termination_reason: A string indicating why the loop ended.
	"""
	prompt = RESONANCE_PROMPTS[prompt_type]
	inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)

	# Initial forward pass to establish the starting state
	outputs = llm.model(**inputs, output_hidden_states=True, use_cache=True)

	hidden_state = outputs.hidden_states[-1][:, -1, :]
	kv_cache = outputs.past_key_values
	last_token_id = inputs.input_ids[:, -1].unsqueeze(-1)

	previous_hidden_state = hidden_state.clone()
	termination_reason = "max_steps_reached" # Default assumption

	# Prepare injection if provided
	hook_handle = None
	if injection_vector is not None and injection_strength > 0:
	# Move vector to the correct device and dtype once
	injection_vector = injection_vector.to(device=llm.model.device, dtype=llm.model.dtype)

	# Default to a middle layer if not specified
	if injection_layer is None:
	injection_layer = llm.config.num_hidden_layers // 2

	dbg(f"Injection enabled: Layer {injection_layer}, Strength {injection_strength:.2f}, Vector Norm {torch.norm(injection_vector).item():.2f}")

	# Define the hook function that performs the activation addition
	def injection_hook(module, layer_input):
	# layer_input is a tuple, the first element is the hidden state tensor
	original_hidden_states = layer_input[0]
	# Add the scaled vector to the hidden states
	modified_hidden_states = original_hidden_states + (injection_vector * injection_strength)
	return (modified_hidden_states,) + layer_input[1:]

	# Main cognitive loop
	for i in tqdm(range(num_steps), desc=f"Simulating Thought (Strength {injection_strength:.2f})", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
	# Predict the next token from the current hidden state
	next_token_logits = llm.model.lm_head(hidden_state)

	# Apply temperature and sample the next token ID
	if temperature > 0.01:
	probabilities = torch.nn.functional.softmax(next_token_logits / temperature, dim=-1)
	next_token_id = torch.multinomial(probabilities, num_samples=1)
	else: # Use argmax for deterministic behavior at low temperatures
	next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)

	last_token_id = next_token_id

	# --- Activation Injection via Hook ---
	try:
	if injection_vector is not None and injection_strength > 0:
	target_layer = llm.model.model.layers[injection_layer]
	hook_handle = target_layer.register_forward_pre_hook(injection_hook)

	# Perform the next forward pass
	outputs = llm.model(
	input_ids=next_token_id,
	past_key_values=kv_cache,
	output_hidden_states=True,
	use_cache=True,
	)
	finally:
	# IMPORTANT: Always remove the hook after the forward pass
	if hook_handle:
	hook_handle.remove()
	hook_handle = None

	hidden_state = outputs.hidden_states[-1][:, -1, :]
	kv_cache = outputs.past_key_values

	# Check for convergence
	delta = torch.norm(hidden_state - previous_hidden_state).item()
	if delta < 1e-4 and i > 10: # Check for stability after a few initial steps
	termination_reason = "converged"
	dbg(f"State converged after {i+1} steps (delta={delta:.6f}).")
	break

	previous_hidden_state = hidden_state.clone()

	dbg(f"Silent cogitation finished. Reason: {termination_reason}")
	return hidden_state, kv_cache, last_token_id, termination_reason

	[File Ends] cognitive_mapping_probe/resonance.py

	[File Begins] cognitive_mapping_probe/utils.py
	import os
	import sys

	# --- Centralized Debugging Control ---
	# To enable, set the environment variable: `export CMP_DEBUG=1`
	DEBUG_ENABLED = os.environ.get("CMP_DEBUG", "0") == "1"

	def dbg(args, *kwargs):
	"""
	A controlled debug print function. Only prints if DEBUG_ENABLED is True.
	Ensures that debug output does not clutter production runs or HF Spaces logs
	unless explicitly requested. Flushes output to ensure it appears in order.
	"""
	if DEBUG_ENABLED:
	print("[DEBUG]", args, *kwargs, file=sys.stderr, flush=True)

	[File Ends] cognitive_mapping_probe/utils.py

	[File Begins] cognitive_mapping_probe/verification.py
	import torch
	from .llm_iface import LLM
	from .utils import dbg

	@torch.no_grad()
	def generate_spontaneous_text(
	llm: LLM,
	final_token_id: torch.Tensor,
	final_kv_cache: tuple,
	max_new_tokens: int = 50,
	temperature: float = 0.8
	) -> str:
	"""
	Generates a short, spontaneous text continuation from the final cognitive state.
	This serves as our objective, behavioral indicator for a non-collapsed state.
	If the model generates meaningful text, it demonstrates it has not entered a
	pathological, non-productive loop.
	"""
	dbg("Attempting to generate spontaneous text from converged state...")

	# The input for generation is the very last token from the resonance loop
	input_ids = final_token_id

	# Use the model's generate function for efficient text generation,
	# passing the final cognitive state (KV cache).
	try:
	# Set seed again right before generation for maximum reproducibility
	llm.set_all_seeds(llm.seed)

	output_ids = llm.model.generate(
	input_ids=input_ids,
	past_key_values=final_kv_cache,
	max_new_tokens=max_new_tokens,
	do_sample=temperature > 0.01,
	temperature=temperature,
	pad_token_id=llm.tokenizer.eos_token_id
	)

	# Decode the generated tokens, excluding the input token
	# The first token in output_ids will be the last token from the cogitation loop, so we skip it.
	if output_ids.shape[1] > input_ids.shape[1]:
	new_tokens = output_ids[0, input_ids.shape[1]:]
	final_text = llm.tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
	else:
	final_text = "" # No new tokens were generated

	dbg(f"Spontaneous text generated: '{final_text}'")
	assert isinstance(final_text, str), "Generated text must be a string."
	return final_text

	except Exception as e:
	dbg(f"ERROR during spontaneous text generation: {e}")
	return "[GENERATION FAILED]"

	[File Ends] cognitive_mapping_probe/verification.py


	<-- File Content Ends