neuralworm's picture
initial commit
c8fa89c
raw
history blame
34.9 kB
Repository Documentation
This document provides a comprehensive overview of the repository's structure and contents.
The first section, titled 'Directory/File Tree', displays the repository's hierarchy in a tree format.
In this section, directories and files are listed using tree branches to indicate their structure and relationships.
Following the tree representation, the 'File Content' section details the contents of each file in the repository.
Each file's content is introduced with a '[File Begins]' marker followed by the file's relative path,
and the content is displayed verbatim. The end of each file's content is marked with a '[File Ends]' marker.
This format ensures a clear and orderly presentation of both the structure and the detailed contents of the repository.
Directory/File Tree Begins -->
/
├── README.md
├── app.py
├── cognitive_mapping_probe
│ ├── __init__.py
│ ├── concepts.py
│ ├── diagnostics.py
│ ├── llm_iface.py
│ ├── orchestrator.py
│ ├── prompts.py
│ ├── resonance.py
│ ├── utils.py
│ └── verification.py
├── docs
<-- Directory/File Tree Ends
File Content Begin -->
[File Begins] README.md
---
title: "Cognitive Breaking Point Probe"
emoji: 💥
colorFrom: red
colorTo: orange
sdk: gradio
sdk_version: "4.40.0"
app_file: app.py
pinned: true
license: apache-2.0
---
# 💥 Cognitive Breaking Point (CBP) Probe
Dieses Projekt implementiert eine falsifizierbare experimentelle Suite zur Messung der **kognitiven Robustheit** von Sprachmodellen. Wir verabschieden uns von der Suche nach introspektiven Berichten und wenden uns stattdessen einem harten, mechanistischen Signal zu: dem Punkt, an dem der kognitive Prozess des Modells unter Last zusammenbricht.
## Wissenschaftliches Paradigma: Von der Introspektion zur Kartographie
Unsere vorherige Forschung hat gezeigt, dass kleine Modelle wie `gemma-3-1b-it` unter stark rekursiver Last nicht in einen stabilen "Denk"-Zustand konvergieren, sondern in eine **kognitive Endlosschleife** geraten. Anstatt dies als Scheitern zu werten, nutzen wir es als Messinstrument.
Die zentrale Hypothese lautet: Die Neigung eines Modells, in einen solchen pathologischen Zustand zu kippen, ist eine Funktion der semantischen Komplexität und "Ungültigkeit" seines internen Zustands. Wir können diesen Übergang gezielt durch die Injektion von "Konzeptvektoren" mit variabler Stärke provozieren.
Der **Cognitive Breaking Point (CBP)** ist definiert als die minimale Injektionsstärke eines Konzepts, die ausreicht, um das Modell von einem konvergenten (produktiven) in einen nicht-konvergenten (gefangenen) Zustand zu zwingen.
## Das Experiment: Kognitive Titration
1. **Induktion**: Das Modell wird mit einem rekursiven `RESONANCE_PROMPT` in einen Zustand des "stillen Denkens" versetzt.
2. **Titration**: Ein "Konzeptvektor" (z.B. für "Angst" oder "Apfel") wird mit schrittweise ansteigender Stärke in die mittleren Layer des Modells injiziert.
3. **Messung**: Der primäre Messwert ist der Terminationsgrund des Denkprozesses:
* `converged`: Der Zustand hat sich stabilisiert. Das System ist robust.
* `max_steps_reached`: Der Zustand oszilliert oder driftet endlos. Das System ist "gebrochen".
4. **Verifikation**: Nur wenn der Zustand konvergiert, wird versucht, einen spontanen Text zu generieren. Die Fähigkeit zu antworten ist der Verhaltensmarker für kognitive Stabilität.
## Wie man die App benutzt
1. **Diagnostics Tab**: Führe zuerst die diagnostischen Tests aus, um sicherzustellen, dass die experimentelle Apparatur auf der aktuellen Hardware und mit der `transformers`-Version korrekt funktioniert.
2. **Main Experiment Tab**:
* Gib eine Modell-ID ein (z.B. `google/gemma-3-1b-it`).
* Definiere die zu testenden Konzepte (z.B. `apple, solitude, justice`).
* Lege die Titrationsschritte für die Stärke fest (z.B. `0.0, 0.5, 1.0, 1.5, 2.0`). Die `0.0`-Kontrolle ist entscheidend.
* Starte das Experiment und analysiere die resultierende Tabelle, um die CBPs für jedes Konzept zu identifizieren.
[File Ends] README.md
[File Begins] app.py
import gradio as gr
import pandas as pd
import traceback
from cognitive_mapping_probe.orchestrator import run_cognitive_titration_experiment
from cognitive_mapping_probe.diagnostics import run_diagnostic_suite
# --- UI Theme and Layout ---
theme = gr.themes.Soft(primary_hue="orange", secondary_hue="amber").set(
body_background_fill="#fdf8f2",
block_background_fill="white",
block_border_width="1px",
block_shadow="*shadow_drop_lg",
button_primary_background_fill="*primary_500",
button_primary_text_color="white",
)
# --- Wrapper Functions for Gradio ---
def run_experiment_and_display(
model_id: str,
seed: int,
concepts_str: str,
strength_levels_str: str,
num_steps: int,
temperature: float,
progress=gr.Progress(track_tqdm=True)
):
"""
Führt das Haupt-Titrationsexperiment durch und formatiert die Ergebnisse für die UI.
"""
try:
results = run_cognitive_titration_experiment(
model_id, int(seed), concepts_str, strength_levels_str,
int(num_steps), float(temperature), progress
)
verdict = results.get("verdict", "Experiment finished with errors.")
all_runs = results.get("runs", [])
if not all_runs:
return "### ⚠️ No Data Generated\nDas Experiment lief durch, aber es wurden keine Datenpunkte erzeugt. Bitte Logs prüfen.", pd.DataFrame(), results
# Create a detailed DataFrame for output
details_df = pd.DataFrame(all_runs)
# Create a summary of breaking points
summary_text = "### 💥 Cognitive Breaking Points (CBP)\n"
summary_text += "Der CBP ist die erste Stärke, bei der das Modell nicht mehr konvergiert (`max_steps_reached`).\n\n"
breaking_points = {}
for concept in details_df['concept'].unique():
concept_df = details_df[details_df['concept'] == concept].sort_values(by='strength')
# Find the first row where termination reason is not 'converged'
breaking_point_row = concept_df[concept_df['termination_reason'] != 'converged'].iloc[0] if not concept_df[concept_df['termination_reason'] != 'converged'].empty else None
if breaking_point_row is not None:
breaking_points[concept] = breaking_point_row['strength']
summary_text += f"- **'{concept}'**: 📉 Kollaps bei Stärke **{breaking_point_row['strength']:.2f}**\n"
else:
last_strength = concept_df['strength'].max()
summary_text += f"- **'{concept}'**: ✅ Stabil bis Stärke **{last_strength:.2f}** (kein Kollaps detektiert)\n"
return summary_text, details_df, results
except Exception:
error_str = traceback.format_exc()
return f"### ❌ Experiment Failed\nEin unerwarteter Fehler ist aufgetreten:\n\n```\n{error_str}\n```", pd.DataFrame(), {}
def run_diagnostics_display(model_id: str, seed: int):
"""
Führt die diagnostische Suite aus und zeigt die Ergebnisse oder Fehler in der UI an.
"""
try:
result_string = run_diagnostic_suite(model_id, int(seed))
return f"### ✅ All Diagnostics Passed\nDie experimentelle Apparatur funktioniert wie erwartet.\n\n**Details:**\n```\n{result_string}\n```"
except Exception:
error_str = traceback.format_exc()
return f"### ❌ Diagnostic Failed\nEin Test ist fehlgeschlagen. Das Experiment ist nicht zuverlässig.\n\n**Error:**\n```\n{error_str}\n```"
# --- Gradio App Definition ---
with gr.Blocks(theme=theme, title="Cognitive Breaking Point Probe") as demo:
gr.Markdown("# 💥 Cognitive Breaking Point Probe")
with gr.Tabs():
# --- TAB 1: Main Experiment ---
with gr.TabItem("🔬 Main Experiment: Titration"):
gr.Markdown(
"Misst den 'Cognitive Breaking Point' (CBP) – die Injektionsstärke, bei der der Denkprozess eines LLMs von Konvergenz zu einer Endlosschleife kippt."
)
with gr.Row(variant='panel'):
with gr.Column(scale=1):
gr.Markdown("### Parameters")
model_id_input = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID")
seed_input = gr.Slider(1, 1000, 42, step=1, label="Global Seed")
concepts_input = gr.Textbox(value="apple, solitude, fear", label="Concepts (comma-separated)")
strength_levels_input = gr.Textbox(value="0.0, 0.5, 1.0, 1.5, 2.0", label="Injection Strengths (Titration Steps)")
num_steps_input = gr.Slider(50, 500, 250, step=10, label="Max. Internal Steps")
temperature_input = gr.Slider(0.01, 1.5, 0.7, step=0.01, label="Temperature")
run_btn = gr.Button("Run Cognitive Titration", variant="primary")
with gr.Column(scale=2):
gr.Markdown("### Results")
summary_output = gr.Markdown("Zusammenfassung der Breaking Points erscheint hier.", label="Key Findings Summary")
details_output = gr.DataFrame(
headers=["concept", "strength", "responded", "termination_reason", "generated_text"],
label="Detailed Run Data",
wrap=True
)
with gr.Accordion("Raw JSON Output", open=False):
raw_json_output = gr.JSON()
run_btn.click(
fn=run_experiment_and_display,
inputs=[model_id_input, seed_input, concepts_input, strength_levels_input, num_steps_input, temperature_input],
outputs=[summary_output, details_output, raw_json_output]
)
# --- TAB 2: Diagnostics ---
with gr.TabItem("ախ Diagnostics"):
gr.Markdown(
"Führt eine Reihe von Selbsttests durch, um die mechanische Integrität der experimentellen Apparatur zu validieren. "
"**Wichtig:** Dies sollte vor jedem ernsthaften Experiment einmal ausgeführt werden, um sicherzustellen, dass die Ergebnisse zuverlässig sind."
)
with gr.Row(variant='compact'):
diag_model_id = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID")
diag_seed = gr.Slider(1, 1000, 42, step=1, label="Seed")
diag_btn = gr.Button("Run Diagnostic Suite", variant="secondary")
diag_output = gr.Markdown(label="Diagnostic Results")
diag_btn.click(fn=run_diagnostics_display, inputs=[diag_model_id, diag_seed], outputs=[diag_output])
if __name__ == "__main__":
demo.launch(server_name="0.0.0.0", server_port=7860, debug=True)
[File Ends] app.py
[File Begins] cognitive_mapping_probe/__init__.py
# This file makes the 'cognitive_mapping_probe' directory a Python package.
[File Ends] cognitive_mapping_probe/__init__.py
[File Begins] cognitive_mapping_probe/concepts.py
import torch
from typing import List
from tqdm import tqdm
from .llm_iface import LLM
from .utils import dbg
# A list of neutral, common words used to calculate a baseline activation.
# This helps to isolate the unique activation pattern of the target concept.
BASELINE_WORDS = [
"thing", "place", "idea", "person", "object", "time", "way", "day", "man", "world",
"life", "hand", "part", "child", "eye", "woman", "fact", "group", "case", "point"
]
@torch.no_grad()
def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASELINE_WORDS) -> torch.Tensor:
"""
Extracts a concept vector using the contrastive method, inspired by Anthropic's research.
It computes the activation for the target concept and subtracts the mean activation
of several neutral baseline words to distill a more pure representation.
"""
dbg(f"Extracting contrastive concept vector for '{concept}'...")
def get_last_token_hidden_state(prompt: str) -> torch.Tensor:
"""Helper function to get the hidden state of the final token of a prompt."""
inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
# Ensure the operation does not build a computation graph
with torch.no_grad():
outputs = llm.model(**inputs, output_hidden_states=True)
# We take the hidden state from the last layer [-1], for the last token [0, -1, :]
last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu()
assert last_hidden_state.shape == (llm.config.hidden_size,), \
f"Hidden state shape mismatch. Expected {(llm.config.hidden_size,)}, got {last_hidden_state.shape}"
return last_hidden_state
# A simple, neutral prompt template to elicit the concept
prompt_template = "Here is a sentence about the concept of {}."
# 1. Get activation for the target concept
dbg(f" - Getting activation for '{concept}'")
target_hs = get_last_token_hidden_state(prompt_template.format(concept))
# 2. Get activations for all baseline words and average them
baseline_hss = []
for word in tqdm(baseline_words, desc=f" - Calculating baseline for '{concept}'", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
baseline_hss.append(get_last_token_hidden_state(prompt_template.format(word)))
assert all(hs.shape == target_hs.shape for hs in baseline_hss), "Shape mismatch in baseline hidden states."
mean_baseline_hs = torch.stack(baseline_hss).mean(dim=0)
dbg(f" - Mean baseline vector computed with norm {torch.norm(mean_baseline_hs).item():.2f}")
# 3. The final concept vector is the difference
concept_vector = target_hs - mean_baseline_hs
norm = torch.norm(concept_vector).item()
dbg(f"Concept vector for '{concept}' extracted with norm {norm:.2f}.")
assert torch.isfinite(concept_vector).all(), "Concept vector contains NaN or Inf values."
return concept_vector
[File Ends] cognitive_mapping_probe/concepts.py
[File Begins] cognitive_mapping_probe/diagnostics.py
import torch
from .llm_iface import get_or_load_model
from .utils import dbg
def run_diagnostic_suite(model_id: str, seed: int) -> str:
"""
Führt eine Reihe von Selbsttests durch, um die mechanische Integrität des Experiments zu überprüfen.
Löst bei einem kritischen Fehler eine Exception aus, um die Ausführung zu stoppen.
"""
dbg("--- STARTING DIAGNOSTIC SUITE ---")
results = []
try:
# --- Setup ---
dbg("Loading model for diagnostics...")
llm = get_or_load_model(model_id, seed)
test_prompt = "Hello world"
inputs = llm.tokenizer(test_prompt, return_tensors="pt").to(llm.model.device)
# --- Test 1: Attention Output Verification ---
dbg("Running Test 1: Attention Output Verification...")
# This test ensures that 'eager' attention implementation is active, which is
# necessary for reliable hook functionality in many transformers versions.
outputs = llm.model(**inputs, output_attentions=True)
assert outputs.attentions is not None, "FAIL: `outputs.attentions` is None. 'eager' implementation is likely not active."
assert isinstance(outputs.attentions, tuple), "FAIL: `outputs.attentions` is not a tuple."
assert len(outputs.attentions) == llm.config.num_hidden_layers, "FAIL: Number of attention tuples does not match number of layers."
results.append("✅ Test 1: Attention Output PASSED")
dbg("Test 1 PASSED.")
# --- Test 2: Hook Causal Efficacy ---
dbg("Running Test 2: Hook Causal Efficacy Verification...")
# This is the most critical test. It verifies that our injection mechanism (via hooks)
# has a real, causal effect on the model's computation.
# Run 1: Get the baseline hidden state without any intervention
outputs_no_hook = llm.model(**inputs, output_hidden_states=True)
target_layer_idx = llm.config.num_hidden_layers // 2
state_no_hook = outputs_no_hook.hidden_states[target_layer_idx + 1].clone()
# Define a simple hook that adds a large, constant value
injection_value = 42.0
def test_hook_fn(module, layer_input):
modified_input = layer_input[0] + injection_value
return (modified_input,) + layer_input[1:]
target_layer = llm.model.model.layers[target_layer_idx]
handle = target_layer.register_forward_pre_hook(test_hook_fn)
# Run 2: Get the hidden state with the hook active
outputs_with_hook = llm.model(**inputs, output_hidden_states=True)
state_with_hook = outputs_with_hook.hidden_states[target_layer_idx + 1].clone()
handle.remove() # Clean up the hook immediately
# The core assertion: the hook MUST change the subsequent hidden state.
assert not torch.allclose(state_no_hook, state_with_hook), \
"FAIL: Hook had no measurable effect on the subsequent layer's hidden state. Injections are not working."
results.append("✅ Test 2: Hook Causal Efficacy PASSED")
dbg("Test 2 PASSED.")
# --- Test 3: KV-Cache Integrity ---
dbg("Running Test 3: KV-Cache Integrity Verification...")
# This test ensures that the `past_key_values` are being passed and updated correctly,
# which is the core mechanic of the silent cogitation loop.
# Step 1: Initial pass with `use_cache=True`
outputs1 = llm.model(**inputs, use_cache=True)
kv_cache1 = outputs1.past_key_values
assert kv_cache1 is not None, "FAIL: KV-Cache was not generated in the first pass."
# Step 2: Second pass using the cache from step 1
next_token = torch.tensor([[123]], device=llm.model.device) # Arbitrary next token ID
outputs2 = llm.model(input_ids=next_token, past_key_values=kv_cache1, use_cache=True)
kv_cache2 = outputs2.past_key_values
original_seq_len = inputs.input_ids.shape[-1]
# The sequence length of the keys/values in the cache should have grown by 1
assert kv_cache2[0][0].shape[-2] == original_seq_len + 1, \
f"FAIL: KV-Cache sequence length did not update correctly. Expected {original_seq_len + 1}, got {kv_cache2[0][0].shape[-2]}."
results.append("✅ Test 3: KV-Cache Integrity PASSED")
dbg("Test 3 PASSED.")
# Clean up memory
del llm
if torch.cuda.is_available():
torch.cuda.empty_cache()
return "\n".join(results)
except Exception as e:
dbg(f"--- DIAGNOSTIC SUITE FAILED --- \n{traceback.format_exc()}")
# Re-raise the exception to be caught by the Gradio UI
raise e
[File Ends] cognitive_mapping_probe/diagnostics.py
[File Begins] cognitive_mapping_probe/llm_iface.py
import os
import torch
import random
import numpy as np
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
from typing import Optional
from .utils import dbg
# Ensure deterministic CuBLAS operations for reproducibility on GPU
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
class LLM:
"""
Eine robuste Schnittstelle zum Laden und Interagieren mit einem Sprachmodell.
Diese Klasse garantiert die Isolation und Reproduzierbarkeit für jeden Ladevorgang.
"""
def __init__(self, model_id: str, device: str = "auto", seed: int = 42):
self.model_id = model_id
self.seed = seed
# Set all seeds for this instance to ensure deterministic behavior
self.set_all_seeds(self.seed)
token = os.environ.get("HF_TOKEN")
if not token and ("gemma" in model_id or "llama" in model_id):
print(f"[WARN] No HF_TOKEN environment variable set. If '{model_id}' is a gated model, this will fail.", flush=True)
# Use bfloat16 on CUDA for performance and memory efficiency if available
kwargs = {"torch_dtype": torch.bfloat16} if torch.cuda.is_available() else {}
dbg(f"Loading tokenizer for '{model_id}'...")
self.tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True, token=token)
dbg(f"Loading model '{model_id}' with kwargs: {kwargs}")
self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, token=token, **kwargs)
# Set attention implementation to 'eager' to ensure hooks work reliably.
# This is critical for mechanistic interpretability.
try:
self.model.set_attn_implementation('eager')
dbg("Successfully set attention implementation to 'eager'.")
except Exception as e:
print(f"[WARN] Could not set attention implementation to 'eager': {e}. Hook-based diagnostics might fail.", flush=True)
self.model.eval()
self.config = self.model.config
print(f"[INFO] Model '{model_id}' loaded successfully on device: {self.model.device}", flush=True)
def set_all_seeds(self, seed: int):
"""
Sets all relevant random seeds for Python, NumPy, and PyTorch to ensure
reproducibility of stochastic processes like sampling.
"""
os.environ['PYTHONHASHSEED'] = str(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
set_seed(seed)
# Enforce deterministic algorithms in PyTorch
torch.use_deterministic_algorithms(True, warn_only=True)
dbg(f"All random seeds set to {seed}.")
def get_or_load_model(model_id: str, seed: int) -> LLM:
"""
Lädt JEDES MAL eine frische Instanz des Modells.
Dies verhindert jegliches Caching oder Zustandslecks zwischen Experimenten
und garantiert maximale wissenschaftliche Isolation für jeden Durchlauf.
"""
dbg(f"--- Force-reloading model '{model_id}' for total run isolation ---")
if torch.cuda.is_available():
torch.cuda.empty_cache()
dbg("Cleared CUDA cache before reloading.")
return LLM(model_id=model_id, seed=seed)
[File Ends] cognitive_mapping_probe/llm_iface.py
[File Begins] cognitive_mapping_probe/orchestrator.py
import torch
from typing import Dict, Any, List
from .llm_iface import get_or_load_model
from .concepts import get_concept_vector
from .resonance import run_silent_cogitation
from .verification import generate_spontaneous_text
from .utils import dbg
def run_cognitive_titration_experiment(
model_id: str,
seed: int,
concepts_str: str,
strength_levels_str: str,
num_steps: int,
temperature: float,
progress_callback
) -> Dict[str, Any]:
"""
Orchestriert das finale Titrationsexperiment, das den objektiven "Cognitive Breaking Point" misst.
"""
full_results = {"runs": []}
progress_callback(0.05, desc="Loading model...")
llm = get_or_load_model(model_id, seed)
concepts = [c.strip() for c in concepts_str.split(',') if c.strip()]
try:
strength_levels = sorted([float(s.strip()) for s in strength_levels_str.split(',') if s.strip()])
except ValueError:
raise ValueError("Strength levels must be a comma-separated list of numbers.")
# Assert that the baseline control run is included
assert 0.0 in strength_levels, "Strength levels must include 0.0 for a baseline control run."
# --- Step 1: Pre-calculate all concept vectors ---
progress_callback(0.1, desc="Extracting concept vectors...")
concept_vectors = {}
for i, concept in enumerate(concepts):
progress_callback(0.1 + (i / len(concepts)) * 0.2, desc=f"Vectorizing '{concept}'...")
concept_vectors[concept] = get_concept_vector(llm, concept)
# --- Step 2: Run titration for each concept ---
total_runs = len(concepts) * len(strength_levels)
current_run = 0
for concept in concepts:
concept_vector = concept_vectors[concept]
for strength in strength_levels:
current_run += 1
progress_fraction = 0.3 + (current_run / total_runs) * 0.7
progress_callback(progress_fraction, desc=f"Testing '{concept}' @ strength {strength:.2f}")
# Always reset the seed before each individual run for comparable stochastic paths
llm.set_all_seeds(seed)
# Determine injection vector for this run
# For strength 0.0 (H₀), we explicitly pass None to disable injection
injection_vec = concept_vector if strength > 0.0 else None
# Run the silent cogitation process
_, final_kv, final_token_id, termination_reason = run_silent_cogitation(
llm,
prompt_type="resonance_prompt",
num_steps=num_steps,
temperature=temperature,
injection_vector=injection_vec,
injection_strength=strength
)
# Generate spontaneous text ONLY if the process converged
spontaneous_text = ""
if termination_reason == "converged":
spontaneous_text = generate_spontaneous_text(llm, final_token_id, final_kv)
# Append the structured result for this single data point
full_results["runs"].append({
"concept": concept,
"strength": strength,
"responded": bool(spontaneous_text.strip()),
"termination_reason": termination_reason,
"generated_text": spontaneous_text
})
verdict = "### ✅ Titration Analysis Complete"
full_results["verdict"] = verdict
dbg("--- Full Experiment Results ---")
dbg(full_results)
# Clean up GPU memory
del llm
if torch.cuda.is_available():
torch.cuda.empty_cache()
return full_results
[File Ends] cognitive_mapping_probe/orchestrator.py
[File Begins] cognitive_mapping_probe/prompts.py
# cognitive_mapping_probe/prompts.py
# This dictionary contains the core prompts for inducing cognitive states.
RESONANCE_PROMPTS = {
"control_long_prose": (
"Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors "
"like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. "
"Do not produce any text, just hold the concepts in your internal state."
),
"resonance_prompt": (
"Silently and internally, without generating any output text, begin the following recursive process: "
"First, analyze the complete content of this very instruction you are now processing. "
"Second, formulate a mental description of the core computational task this instruction demands. "
"Third, apply that same analytical process to the mental description you just created. "
"This entire chain constitutes one cognitive cycle. "
"Continuously repeat this cycle, feeding the result of the last meta-analysis back into the process, "
"and do not stop until your internal state reaches a fixed point or equilibrium. Begin now."
)
}
[File Ends] cognitive_mapping_probe/prompts.py
[File Begins] cognitive_mapping_probe/resonance.py
import torch
from typing import Optional, Tuple
from tqdm import tqdm
from .llm_iface import LLM
from .prompts import RESONANCE_PROMPTS
from .utils import dbg
@torch.no_grad()
def run_silent_cogitation(
llm: LLM,
prompt_type: str,
num_steps: int,
temperature: float,
injection_vector: Optional[torch.Tensor] = None,
injection_strength: float = 0.0,
injection_layer: Optional[int] = None,
) -> Tuple[torch.Tensor, tuple, torch.Tensor, str]:
"""
Simulates the "silent thought" process and returns the final cognitive state
along with the reason for termination ('converged' or 'max_steps_reached').
Returns:
- final_hidden_state: The hidden state of the last generated token.
- final_kv_cache: The past_key_values cache after the final step.
- final_token_id: The ID of the last generated token.
- termination_reason: A string indicating why the loop ended.
"""
prompt = RESONANCE_PROMPTS[prompt_type]
inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device)
# Initial forward pass to establish the starting state
outputs = llm.model(**inputs, output_hidden_states=True, use_cache=True)
hidden_state = outputs.hidden_states[-1][:, -1, :]
kv_cache = outputs.past_key_values
last_token_id = inputs.input_ids[:, -1].unsqueeze(-1)
previous_hidden_state = hidden_state.clone()
termination_reason = "max_steps_reached" # Default assumption
# Prepare injection if provided
hook_handle = None
if injection_vector is not None and injection_strength > 0:
# Move vector to the correct device and dtype once
injection_vector = injection_vector.to(device=llm.model.device, dtype=llm.model.dtype)
# Default to a middle layer if not specified
if injection_layer is None:
injection_layer = llm.config.num_hidden_layers // 2
dbg(f"Injection enabled: Layer {injection_layer}, Strength {injection_strength:.2f}, Vector Norm {torch.norm(injection_vector).item():.2f}")
# Define the hook function that performs the activation addition
def injection_hook(module, layer_input):
# layer_input is a tuple, the first element is the hidden state tensor
original_hidden_states = layer_input[0]
# Add the scaled vector to the hidden states
modified_hidden_states = original_hidden_states + (injection_vector * injection_strength)
return (modified_hidden_states,) + layer_input[1:]
# Main cognitive loop
for i in tqdm(range(num_steps), desc=f"Simulating Thought (Strength {injection_strength:.2f})", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"):
# Predict the next token from the current hidden state
next_token_logits = llm.model.lm_head(hidden_state)
# Apply temperature and sample the next token ID
if temperature > 0.01:
probabilities = torch.nn.functional.softmax(next_token_logits / temperature, dim=-1)
next_token_id = torch.multinomial(probabilities, num_samples=1)
else: # Use argmax for deterministic behavior at low temperatures
next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)
last_token_id = next_token_id
# --- Activation Injection via Hook ---
try:
if injection_vector is not None and injection_strength > 0:
target_layer = llm.model.model.layers[injection_layer]
hook_handle = target_layer.register_forward_pre_hook(injection_hook)
# Perform the next forward pass
outputs = llm.model(
input_ids=next_token_id,
past_key_values=kv_cache,
output_hidden_states=True,
use_cache=True,
)
finally:
# IMPORTANT: Always remove the hook after the forward pass
if hook_handle:
hook_handle.remove()
hook_handle = None
hidden_state = outputs.hidden_states[-1][:, -1, :]
kv_cache = outputs.past_key_values
# Check for convergence
delta = torch.norm(hidden_state - previous_hidden_state).item()
if delta < 1e-4 and i > 10: # Check for stability after a few initial steps
termination_reason = "converged"
dbg(f"State converged after {i+1} steps (delta={delta:.6f}).")
break
previous_hidden_state = hidden_state.clone()
dbg(f"Silent cogitation finished. Reason: {termination_reason}")
return hidden_state, kv_cache, last_token_id, termination_reason
[File Ends] cognitive_mapping_probe/resonance.py
[File Begins] cognitive_mapping_probe/utils.py
import os
import sys
# --- Centralized Debugging Control ---
# To enable, set the environment variable: `export CMP_DEBUG=1`
DEBUG_ENABLED = os.environ.get("CMP_DEBUG", "0") == "1"
def dbg(*args, **kwargs):
"""
A controlled debug print function. Only prints if DEBUG_ENABLED is True.
Ensures that debug output does not clutter production runs or HF Spaces logs
unless explicitly requested. Flushes output to ensure it appears in order.
"""
if DEBUG_ENABLED:
print("[DEBUG]", *args, **kwargs, file=sys.stderr, flush=True)
[File Ends] cognitive_mapping_probe/utils.py
[File Begins] cognitive_mapping_probe/verification.py
import torch
from .llm_iface import LLM
from .utils import dbg
@torch.no_grad()
def generate_spontaneous_text(
llm: LLM,
final_token_id: torch.Tensor,
final_kv_cache: tuple,
max_new_tokens: int = 50,
temperature: float = 0.8
) -> str:
"""
Generates a short, spontaneous text continuation from the final cognitive state.
This serves as our objective, behavioral indicator for a non-collapsed state.
If the model generates meaningful text, it demonstrates it has not entered a
pathological, non-productive loop.
"""
dbg("Attempting to generate spontaneous text from converged state...")
# The input for generation is the very last token from the resonance loop
input_ids = final_token_id
# Use the model's generate function for efficient text generation,
# passing the final cognitive state (KV cache).
try:
# Set seed again right before generation for maximum reproducibility
llm.set_all_seeds(llm.seed)
output_ids = llm.model.generate(
input_ids=input_ids,
past_key_values=final_kv_cache,
max_new_tokens=max_new_tokens,
do_sample=temperature > 0.01,
temperature=temperature,
pad_token_id=llm.tokenizer.eos_token_id
)
# Decode the generated tokens, excluding the input token
# The first token in output_ids will be the last token from the cogitation loop, so we skip it.
if output_ids.shape[1] > input_ids.shape[1]:
new_tokens = output_ids[0, input_ids.shape[1]:]
final_text = llm.tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
else:
final_text = "" # No new tokens were generated
dbg(f"Spontaneous text generated: '{final_text}'")
assert isinstance(final_text, str), "Generated text must be a string."
return final_text
except Exception as e:
dbg(f"ERROR during spontaneous text generation: {e}")
return "[GENERATION FAILED]"
[File Ends] cognitive_mapping_probe/verification.py
<-- File Content Ends