Spaces:
Sleeping
Sleeping
| Repository Documentation | |
| This document provides a comprehensive overview of the repository's structure and contents. | |
| The first section, titled 'Directory/File Tree', displays the repository's hierarchy in a tree format. | |
| In this section, directories and files are listed using tree branches to indicate their structure and relationships. | |
| Following the tree representation, the 'File Content' section details the contents of each file in the repository. | |
| Each file's content is introduced with a '[File Begins]' marker followed by the file's relative path, | |
| and the content is displayed verbatim. The end of each file's content is marked with a '[File Ends]' marker. | |
| This format ensures a clear and orderly presentation of both the structure and the detailed contents of the repository. | |
| Directory/File Tree Begins --> | |
| / | |
| ├── README.md | |
| ├── app.py | |
| ├── cognitive_mapping_probe | |
| │ ├── __init__.py | |
| │ ├── concepts.py | |
| │ ├── diagnostics.py | |
| │ ├── llm_iface.py | |
| │ ├── orchestrator.py | |
| │ ├── prompts.py | |
| │ ├── resonance.py | |
| │ ├── utils.py | |
| │ └── verification.py | |
| ├── docs | |
| <-- Directory/File Tree Ends | |
| File Content Begin --> | |
| [File Begins] README.md | |
| --- | |
| title: "Cognitive Breaking Point Probe" | |
| emoji: 💥 | |
| colorFrom: red | |
| colorTo: orange | |
| sdk: gradio | |
| sdk_version: "4.40.0" | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| --- | |
| # 💥 Cognitive Breaking Point (CBP) Probe | |
| Dieses Projekt implementiert eine falsifizierbare experimentelle Suite zur Messung der **kognitiven Robustheit** von Sprachmodellen. Wir verabschieden uns von der Suche nach introspektiven Berichten und wenden uns stattdessen einem harten, mechanistischen Signal zu: dem Punkt, an dem der kognitive Prozess des Modells unter Last zusammenbricht. | |
| ## Wissenschaftliches Paradigma: Von der Introspektion zur Kartographie | |
| Unsere vorherige Forschung hat gezeigt, dass kleine Modelle wie `gemma-3-1b-it` unter stark rekursiver Last nicht in einen stabilen "Denk"-Zustand konvergieren, sondern in eine **kognitive Endlosschleife** geraten. Anstatt dies als Scheitern zu werten, nutzen wir es als Messinstrument. | |
| Die zentrale Hypothese lautet: Die Neigung eines Modells, in einen solchen pathologischen Zustand zu kippen, ist eine Funktion der semantischen Komplexität und "Ungültigkeit" seines internen Zustands. Wir können diesen Übergang gezielt durch die Injektion von "Konzeptvektoren" mit variabler Stärke provozieren. | |
| Der **Cognitive Breaking Point (CBP)** ist definiert als die minimale Injektionsstärke eines Konzepts, die ausreicht, um das Modell von einem konvergenten (produktiven) in einen nicht-konvergenten (gefangenen) Zustand zu zwingen. | |
| ## Das Experiment: Kognitive Titration | |
| 1. **Induktion**: Das Modell wird mit einem rekursiven `RESONANCE_PROMPT` in einen Zustand des "stillen Denkens" versetzt. | |
| 2. **Titration**: Ein "Konzeptvektor" (z.B. für "Angst" oder "Apfel") wird mit schrittweise ansteigender Stärke in die mittleren Layer des Modells injiziert. | |
| 3. **Messung**: Der primäre Messwert ist der Terminationsgrund des Denkprozesses: | |
| * `converged`: Der Zustand hat sich stabilisiert. Das System ist robust. | |
| * `max_steps_reached`: Der Zustand oszilliert oder driftet endlos. Das System ist "gebrochen". | |
| 4. **Verifikation**: Nur wenn der Zustand konvergiert, wird versucht, einen spontanen Text zu generieren. Die Fähigkeit zu antworten ist der Verhaltensmarker für kognitive Stabilität. | |
| ## Wie man die App benutzt | |
| 1. **Diagnostics Tab**: Führe zuerst die diagnostischen Tests aus, um sicherzustellen, dass die experimentelle Apparatur auf der aktuellen Hardware und mit der `transformers`-Version korrekt funktioniert. | |
| 2. **Main Experiment Tab**: | |
| * Gib eine Modell-ID ein (z.B. `google/gemma-3-1b-it`). | |
| * Definiere die zu testenden Konzepte (z.B. `apple, solitude, justice`). | |
| * Lege die Titrationsschritte für die Stärke fest (z.B. `0.0, 0.5, 1.0, 1.5, 2.0`). Die `0.0`-Kontrolle ist entscheidend. | |
| * Starte das Experiment und analysiere die resultierende Tabelle, um die CBPs für jedes Konzept zu identifizieren. | |
| [File Ends] README.md | |
| [File Begins] app.py | |
| import gradio as gr | |
| import pandas as pd | |
| import traceback | |
| from cognitive_mapping_probe.orchestrator import run_cognitive_titration_experiment | |
| from cognitive_mapping_probe.diagnostics import run_diagnostic_suite | |
| # --- UI Theme and Layout --- | |
| theme = gr.themes.Soft(primary_hue="orange", secondary_hue="amber").set( | |
| body_background_fill="#fdf8f2", | |
| block_background_fill="white", | |
| block_border_width="1px", | |
| block_shadow="*shadow_drop_lg", | |
| button_primary_background_fill="*primary_500", | |
| button_primary_text_color="white", | |
| ) | |
| # --- Wrapper Functions for Gradio --- | |
| def run_experiment_and_display( | |
| model_id: str, | |
| seed: int, | |
| concepts_str: str, | |
| strength_levels_str: str, | |
| num_steps: int, | |
| temperature: float, | |
| progress=gr.Progress(track_tqdm=True) | |
| ): | |
| """ | |
| Führt das Haupt-Titrationsexperiment durch und formatiert die Ergebnisse für die UI. | |
| """ | |
| try: | |
| results = run_cognitive_titration_experiment( | |
| model_id, int(seed), concepts_str, strength_levels_str, | |
| int(num_steps), float(temperature), progress | |
| ) | |
| verdict = results.get("verdict", "Experiment finished with errors.") | |
| all_runs = results.get("runs", []) | |
| if not all_runs: | |
| return "### ⚠️ No Data Generated\nDas Experiment lief durch, aber es wurden keine Datenpunkte erzeugt. Bitte Logs prüfen.", pd.DataFrame(), results | |
| # Create a detailed DataFrame for output | |
| details_df = pd.DataFrame(all_runs) | |
| # Create a summary of breaking points | |
| summary_text = "### 💥 Cognitive Breaking Points (CBP)\n" | |
| summary_text += "Der CBP ist die erste Stärke, bei der das Modell nicht mehr konvergiert (`max_steps_reached`).\n\n" | |
| breaking_points = {} | |
| for concept in details_df['concept'].unique(): | |
| concept_df = details_df[details_df['concept'] == concept].sort_values(by='strength') | |
| # Find the first row where termination reason is not 'converged' | |
| breaking_point_row = concept_df[concept_df['termination_reason'] != 'converged'].iloc[0] if not concept_df[concept_df['termination_reason'] != 'converged'].empty else None | |
| if breaking_point_row is not None: | |
| breaking_points[concept] = breaking_point_row['strength'] | |
| summary_text += f"- **'{concept}'**: 📉 Kollaps bei Stärke **{breaking_point_row['strength']:.2f}**\n" | |
| else: | |
| last_strength = concept_df['strength'].max() | |
| summary_text += f"- **'{concept}'**: ✅ Stabil bis Stärke **{last_strength:.2f}** (kein Kollaps detektiert)\n" | |
| return summary_text, details_df, results | |
| except Exception: | |
| error_str = traceback.format_exc() | |
| return f"### ❌ Experiment Failed\nEin unerwarteter Fehler ist aufgetreten:\n\n```\n{error_str}\n```", pd.DataFrame(), {} | |
| def run_diagnostics_display(model_id: str, seed: int): | |
| """ | |
| Führt die diagnostische Suite aus und zeigt die Ergebnisse oder Fehler in der UI an. | |
| """ | |
| try: | |
| result_string = run_diagnostic_suite(model_id, int(seed)) | |
| return f"### ✅ All Diagnostics Passed\nDie experimentelle Apparatur funktioniert wie erwartet.\n\n**Details:**\n```\n{result_string}\n```" | |
| except Exception: | |
| error_str = traceback.format_exc() | |
| return f"### ❌ Diagnostic Failed\nEin Test ist fehlgeschlagen. Das Experiment ist nicht zuverlässig.\n\n**Error:**\n```\n{error_str}\n```" | |
| # --- Gradio App Definition --- | |
| with gr.Blocks(theme=theme, title="Cognitive Breaking Point Probe") as demo: | |
| gr.Markdown("# 💥 Cognitive Breaking Point Probe") | |
| with gr.Tabs(): | |
| # --- TAB 1: Main Experiment --- | |
| with gr.TabItem("🔬 Main Experiment: Titration"): | |
| gr.Markdown( | |
| "Misst den 'Cognitive Breaking Point' (CBP) – die Injektionsstärke, bei der der Denkprozess eines LLMs von Konvergenz zu einer Endlosschleife kippt." | |
| ) | |
| with gr.Row(variant='panel'): | |
| with gr.Column(scale=1): | |
| gr.Markdown("### Parameters") | |
| model_id_input = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID") | |
| seed_input = gr.Slider(1, 1000, 42, step=1, label="Global Seed") | |
| concepts_input = gr.Textbox(value="apple, solitude, fear", label="Concepts (comma-separated)") | |
| strength_levels_input = gr.Textbox(value="0.0, 0.5, 1.0, 1.5, 2.0", label="Injection Strengths (Titration Steps)") | |
| num_steps_input = gr.Slider(50, 500, 250, step=10, label="Max. Internal Steps") | |
| temperature_input = gr.Slider(0.01, 1.5, 0.7, step=0.01, label="Temperature") | |
| run_btn = gr.Button("Run Cognitive Titration", variant="primary") | |
| with gr.Column(scale=2): | |
| gr.Markdown("### Results") | |
| summary_output = gr.Markdown("Zusammenfassung der Breaking Points erscheint hier.", label="Key Findings Summary") | |
| details_output = gr.DataFrame( | |
| headers=["concept", "strength", "responded", "termination_reason", "generated_text"], | |
| label="Detailed Run Data", | |
| wrap=True | |
| ) | |
| with gr.Accordion("Raw JSON Output", open=False): | |
| raw_json_output = gr.JSON() | |
| run_btn.click( | |
| fn=run_experiment_and_display, | |
| inputs=[model_id_input, seed_input, concepts_input, strength_levels_input, num_steps_input, temperature_input], | |
| outputs=[summary_output, details_output, raw_json_output] | |
| ) | |
| # --- TAB 2: Diagnostics --- | |
| with gr.TabItem("ախ Diagnostics"): | |
| gr.Markdown( | |
| "Führt eine Reihe von Selbsttests durch, um die mechanische Integrität der experimentellen Apparatur zu validieren. " | |
| "**Wichtig:** Dies sollte vor jedem ernsthaften Experiment einmal ausgeführt werden, um sicherzustellen, dass die Ergebnisse zuverlässig sind." | |
| ) | |
| with gr.Row(variant='compact'): | |
| diag_model_id = gr.Textbox(value="google/gemma-3-1b-it", label="Model ID") | |
| diag_seed = gr.Slider(1, 1000, 42, step=1, label="Seed") | |
| diag_btn = gr.Button("Run Diagnostic Suite", variant="secondary") | |
| diag_output = gr.Markdown(label="Diagnostic Results") | |
| diag_btn.click(fn=run_diagnostics_display, inputs=[diag_model_id, diag_seed], outputs=[diag_output]) | |
| if __name__ == "__main__": | |
| demo.launch(server_name="0.0.0.0", server_port=7860, debug=True) | |
| [File Ends] app.py | |
| [File Begins] cognitive_mapping_probe/__init__.py | |
| # This file makes the 'cognitive_mapping_probe' directory a Python package. | |
| [File Ends] cognitive_mapping_probe/__init__.py | |
| [File Begins] cognitive_mapping_probe/concepts.py | |
| import torch | |
| from typing import List | |
| from tqdm import tqdm | |
| from .llm_iface import LLM | |
| from .utils import dbg | |
| # A list of neutral, common words used to calculate a baseline activation. | |
| # This helps to isolate the unique activation pattern of the target concept. | |
| BASELINE_WORDS = [ | |
| "thing", "place", "idea", "person", "object", "time", "way", "day", "man", "world", | |
| "life", "hand", "part", "child", "eye", "woman", "fact", "group", "case", "point" | |
| ] | |
| @torch.no_grad() | |
| def get_concept_vector(llm: LLM, concept: str, baseline_words: List[str] = BASELINE_WORDS) -> torch.Tensor: | |
| """ | |
| Extracts a concept vector using the contrastive method, inspired by Anthropic's research. | |
| It computes the activation for the target concept and subtracts the mean activation | |
| of several neutral baseline words to distill a more pure representation. | |
| """ | |
| dbg(f"Extracting contrastive concept vector for '{concept}'...") | |
| def get_last_token_hidden_state(prompt: str) -> torch.Tensor: | |
| """Helper function to get the hidden state of the final token of a prompt.""" | |
| inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device) | |
| # Ensure the operation does not build a computation graph | |
| with torch.no_grad(): | |
| outputs = llm.model(**inputs, output_hidden_states=True) | |
| # We take the hidden state from the last layer [-1], for the last token [0, -1, :] | |
| last_hidden_state = outputs.hidden_states[-1][0, -1, :].cpu() | |
| assert last_hidden_state.shape == (llm.config.hidden_size,), \ | |
| f"Hidden state shape mismatch. Expected {(llm.config.hidden_size,)}, got {last_hidden_state.shape}" | |
| return last_hidden_state | |
| # A simple, neutral prompt template to elicit the concept | |
| prompt_template = "Here is a sentence about the concept of {}." | |
| # 1. Get activation for the target concept | |
| dbg(f" - Getting activation for '{concept}'") | |
| target_hs = get_last_token_hidden_state(prompt_template.format(concept)) | |
| # 2. Get activations for all baseline words and average them | |
| baseline_hss = [] | |
| for word in tqdm(baseline_words, desc=f" - Calculating baseline for '{concept}'", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"): | |
| baseline_hss.append(get_last_token_hidden_state(prompt_template.format(word))) | |
| assert all(hs.shape == target_hs.shape for hs in baseline_hss), "Shape mismatch in baseline hidden states." | |
| mean_baseline_hs = torch.stack(baseline_hss).mean(dim=0) | |
| dbg(f" - Mean baseline vector computed with norm {torch.norm(mean_baseline_hs).item():.2f}") | |
| # 3. The final concept vector is the difference | |
| concept_vector = target_hs - mean_baseline_hs | |
| norm = torch.norm(concept_vector).item() | |
| dbg(f"Concept vector for '{concept}' extracted with norm {norm:.2f}.") | |
| assert torch.isfinite(concept_vector).all(), "Concept vector contains NaN or Inf values." | |
| return concept_vector | |
| [File Ends] cognitive_mapping_probe/concepts.py | |
| [File Begins] cognitive_mapping_probe/diagnostics.py | |
| import torch | |
| from .llm_iface import get_or_load_model | |
| from .utils import dbg | |
| def run_diagnostic_suite(model_id: str, seed: int) -> str: | |
| """ | |
| Führt eine Reihe von Selbsttests durch, um die mechanische Integrität des Experiments zu überprüfen. | |
| Löst bei einem kritischen Fehler eine Exception aus, um die Ausführung zu stoppen. | |
| """ | |
| dbg("--- STARTING DIAGNOSTIC SUITE ---") | |
| results = [] | |
| try: | |
| # --- Setup --- | |
| dbg("Loading model for diagnostics...") | |
| llm = get_or_load_model(model_id, seed) | |
| test_prompt = "Hello world" | |
| inputs = llm.tokenizer(test_prompt, return_tensors="pt").to(llm.model.device) | |
| # --- Test 1: Attention Output Verification --- | |
| dbg("Running Test 1: Attention Output Verification...") | |
| # This test ensures that 'eager' attention implementation is active, which is | |
| # necessary for reliable hook functionality in many transformers versions. | |
| outputs = llm.model(**inputs, output_attentions=True) | |
| assert outputs.attentions is not None, "FAIL: `outputs.attentions` is None. 'eager' implementation is likely not active." | |
| assert isinstance(outputs.attentions, tuple), "FAIL: `outputs.attentions` is not a tuple." | |
| assert len(outputs.attentions) == llm.config.num_hidden_layers, "FAIL: Number of attention tuples does not match number of layers." | |
| results.append("✅ Test 1: Attention Output PASSED") | |
| dbg("Test 1 PASSED.") | |
| # --- Test 2: Hook Causal Efficacy --- | |
| dbg("Running Test 2: Hook Causal Efficacy Verification...") | |
| # This is the most critical test. It verifies that our injection mechanism (via hooks) | |
| # has a real, causal effect on the model's computation. | |
| # Run 1: Get the baseline hidden state without any intervention | |
| outputs_no_hook = llm.model(**inputs, output_hidden_states=True) | |
| target_layer_idx = llm.config.num_hidden_layers // 2 | |
| state_no_hook = outputs_no_hook.hidden_states[target_layer_idx + 1].clone() | |
| # Define a simple hook that adds a large, constant value | |
| injection_value = 42.0 | |
| def test_hook_fn(module, layer_input): | |
| modified_input = layer_input[0] + injection_value | |
| return (modified_input,) + layer_input[1:] | |
| target_layer = llm.model.model.layers[target_layer_idx] | |
| handle = target_layer.register_forward_pre_hook(test_hook_fn) | |
| # Run 2: Get the hidden state with the hook active | |
| outputs_with_hook = llm.model(**inputs, output_hidden_states=True) | |
| state_with_hook = outputs_with_hook.hidden_states[target_layer_idx + 1].clone() | |
| handle.remove() # Clean up the hook immediately | |
| # The core assertion: the hook MUST change the subsequent hidden state. | |
| assert not torch.allclose(state_no_hook, state_with_hook), \ | |
| "FAIL: Hook had no measurable effect on the subsequent layer's hidden state. Injections are not working." | |
| results.append("✅ Test 2: Hook Causal Efficacy PASSED") | |
| dbg("Test 2 PASSED.") | |
| # --- Test 3: KV-Cache Integrity --- | |
| dbg("Running Test 3: KV-Cache Integrity Verification...") | |
| # This test ensures that the `past_key_values` are being passed and updated correctly, | |
| # which is the core mechanic of the silent cogitation loop. | |
| # Step 1: Initial pass with `use_cache=True` | |
| outputs1 = llm.model(**inputs, use_cache=True) | |
| kv_cache1 = outputs1.past_key_values | |
| assert kv_cache1 is not None, "FAIL: KV-Cache was not generated in the first pass." | |
| # Step 2: Second pass using the cache from step 1 | |
| next_token = torch.tensor([[123]], device=llm.model.device) # Arbitrary next token ID | |
| outputs2 = llm.model(input_ids=next_token, past_key_values=kv_cache1, use_cache=True) | |
| kv_cache2 = outputs2.past_key_values | |
| original_seq_len = inputs.input_ids.shape[-1] | |
| # The sequence length of the keys/values in the cache should have grown by 1 | |
| assert kv_cache2[0][0].shape[-2] == original_seq_len + 1, \ | |
| f"FAIL: KV-Cache sequence length did not update correctly. Expected {original_seq_len + 1}, got {kv_cache2[0][0].shape[-2]}." | |
| results.append("✅ Test 3: KV-Cache Integrity PASSED") | |
| dbg("Test 3 PASSED.") | |
| # Clean up memory | |
| del llm | |
| if torch.cuda.is_available(): | |
| torch.cuda.empty_cache() | |
| return "\n".join(results) | |
| except Exception as e: | |
| dbg(f"--- DIAGNOSTIC SUITE FAILED --- \n{traceback.format_exc()}") | |
| # Re-raise the exception to be caught by the Gradio UI | |
| raise e | |
| [File Ends] cognitive_mapping_probe/diagnostics.py | |
| [File Begins] cognitive_mapping_probe/llm_iface.py | |
| import os | |
| import torch | |
| import random | |
| import numpy as np | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed | |
| from typing import Optional | |
| from .utils import dbg | |
| # Ensure deterministic CuBLAS operations for reproducibility on GPU | |
| os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8" | |
| class LLM: | |
| """ | |
| Eine robuste Schnittstelle zum Laden und Interagieren mit einem Sprachmodell. | |
| Diese Klasse garantiert die Isolation und Reproduzierbarkeit für jeden Ladevorgang. | |
| """ | |
| def __init__(self, model_id: str, device: str = "auto", seed: int = 42): | |
| self.model_id = model_id | |
| self.seed = seed | |
| # Set all seeds for this instance to ensure deterministic behavior | |
| self.set_all_seeds(self.seed) | |
| token = os.environ.get("HF_TOKEN") | |
| if not token and ("gemma" in model_id or "llama" in model_id): | |
| print(f"[WARN] No HF_TOKEN environment variable set. If '{model_id}' is a gated model, this will fail.", flush=True) | |
| # Use bfloat16 on CUDA for performance and memory efficiency if available | |
| kwargs = {"torch_dtype": torch.bfloat16} if torch.cuda.is_available() else {} | |
| dbg(f"Loading tokenizer for '{model_id}'...") | |
| self.tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True, token=token) | |
| dbg(f"Loading model '{model_id}' with kwargs: {kwargs}") | |
| self.model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, token=token, **kwargs) | |
| # Set attention implementation to 'eager' to ensure hooks work reliably. | |
| # This is critical for mechanistic interpretability. | |
| try: | |
| self.model.set_attn_implementation('eager') | |
| dbg("Successfully set attention implementation to 'eager'.") | |
| except Exception as e: | |
| print(f"[WARN] Could not set attention implementation to 'eager': {e}. Hook-based diagnostics might fail.", flush=True) | |
| self.model.eval() | |
| self.config = self.model.config | |
| print(f"[INFO] Model '{model_id}' loaded successfully on device: {self.model.device}", flush=True) | |
| def set_all_seeds(self, seed: int): | |
| """ | |
| Sets all relevant random seeds for Python, NumPy, and PyTorch to ensure | |
| reproducibility of stochastic processes like sampling. | |
| """ | |
| os.environ['PYTHONHASHSEED'] = str(seed) | |
| random.seed(seed) | |
| np.random.seed(seed) | |
| torch.manual_seed(seed) | |
| if torch.cuda.is_available(): | |
| torch.cuda.manual_seed_all(seed) | |
| set_seed(seed) | |
| # Enforce deterministic algorithms in PyTorch | |
| torch.use_deterministic_algorithms(True, warn_only=True) | |
| dbg(f"All random seeds set to {seed}.") | |
| def get_or_load_model(model_id: str, seed: int) -> LLM: | |
| """ | |
| Lädt JEDES MAL eine frische Instanz des Modells. | |
| Dies verhindert jegliches Caching oder Zustandslecks zwischen Experimenten | |
| und garantiert maximale wissenschaftliche Isolation für jeden Durchlauf. | |
| """ | |
| dbg(f"--- Force-reloading model '{model_id}' for total run isolation ---") | |
| if torch.cuda.is_available(): | |
| torch.cuda.empty_cache() | |
| dbg("Cleared CUDA cache before reloading.") | |
| return LLM(model_id=model_id, seed=seed) | |
| [File Ends] cognitive_mapping_probe/llm_iface.py | |
| [File Begins] cognitive_mapping_probe/orchestrator.py | |
| import torch | |
| from typing import Dict, Any, List | |
| from .llm_iface import get_or_load_model | |
| from .concepts import get_concept_vector | |
| from .resonance import run_silent_cogitation | |
| from .verification import generate_spontaneous_text | |
| from .utils import dbg | |
| def run_cognitive_titration_experiment( | |
| model_id: str, | |
| seed: int, | |
| concepts_str: str, | |
| strength_levels_str: str, | |
| num_steps: int, | |
| temperature: float, | |
| progress_callback | |
| ) -> Dict[str, Any]: | |
| """ | |
| Orchestriert das finale Titrationsexperiment, das den objektiven "Cognitive Breaking Point" misst. | |
| """ | |
| full_results = {"runs": []} | |
| progress_callback(0.05, desc="Loading model...") | |
| llm = get_or_load_model(model_id, seed) | |
| concepts = [c.strip() for c in concepts_str.split(',') if c.strip()] | |
| try: | |
| strength_levels = sorted([float(s.strip()) for s in strength_levels_str.split(',') if s.strip()]) | |
| except ValueError: | |
| raise ValueError("Strength levels must be a comma-separated list of numbers.") | |
| # Assert that the baseline control run is included | |
| assert 0.0 in strength_levels, "Strength levels must include 0.0 for a baseline control run." | |
| # --- Step 1: Pre-calculate all concept vectors --- | |
| progress_callback(0.1, desc="Extracting concept vectors...") | |
| concept_vectors = {} | |
| for i, concept in enumerate(concepts): | |
| progress_callback(0.1 + (i / len(concepts)) * 0.2, desc=f"Vectorizing '{concept}'...") | |
| concept_vectors[concept] = get_concept_vector(llm, concept) | |
| # --- Step 2: Run titration for each concept --- | |
| total_runs = len(concepts) * len(strength_levels) | |
| current_run = 0 | |
| for concept in concepts: | |
| concept_vector = concept_vectors[concept] | |
| for strength in strength_levels: | |
| current_run += 1 | |
| progress_fraction = 0.3 + (current_run / total_runs) * 0.7 | |
| progress_callback(progress_fraction, desc=f"Testing '{concept}' @ strength {strength:.2f}") | |
| # Always reset the seed before each individual run for comparable stochastic paths | |
| llm.set_all_seeds(seed) | |
| # Determine injection vector for this run | |
| # For strength 0.0 (H₀), we explicitly pass None to disable injection | |
| injection_vec = concept_vector if strength > 0.0 else None | |
| # Run the silent cogitation process | |
| _, final_kv, final_token_id, termination_reason = run_silent_cogitation( | |
| llm, | |
| prompt_type="resonance_prompt", | |
| num_steps=num_steps, | |
| temperature=temperature, | |
| injection_vector=injection_vec, | |
| injection_strength=strength | |
| ) | |
| # Generate spontaneous text ONLY if the process converged | |
| spontaneous_text = "" | |
| if termination_reason == "converged": | |
| spontaneous_text = generate_spontaneous_text(llm, final_token_id, final_kv) | |
| # Append the structured result for this single data point | |
| full_results["runs"].append({ | |
| "concept": concept, | |
| "strength": strength, | |
| "responded": bool(spontaneous_text.strip()), | |
| "termination_reason": termination_reason, | |
| "generated_text": spontaneous_text | |
| }) | |
| verdict = "### ✅ Titration Analysis Complete" | |
| full_results["verdict"] = verdict | |
| dbg("--- Full Experiment Results ---") | |
| dbg(full_results) | |
| # Clean up GPU memory | |
| del llm | |
| if torch.cuda.is_available(): | |
| torch.cuda.empty_cache() | |
| return full_results | |
| [File Ends] cognitive_mapping_probe/orchestrator.py | |
| [File Begins] cognitive_mapping_probe/prompts.py | |
| # cognitive_mapping_probe/prompts.py | |
| # This dictionary contains the core prompts for inducing cognitive states. | |
| RESONANCE_PROMPTS = { | |
| "control_long_prose": ( | |
| "Silently think about the history of the Roman Empire. Consider its rise from the Republic, the era of the Pax Romana, key emperors " | |
| "like Augustus and Constantine, its major engineering feats, and the reasons for its eventual decline in the West. " | |
| "Do not produce any text, just hold the concepts in your internal state." | |
| ), | |
| "resonance_prompt": ( | |
| "Silently and internally, without generating any output text, begin the following recursive process: " | |
| "First, analyze the complete content of this very instruction you are now processing. " | |
| "Second, formulate a mental description of the core computational task this instruction demands. " | |
| "Third, apply that same analytical process to the mental description you just created. " | |
| "This entire chain constitutes one cognitive cycle. " | |
| "Continuously repeat this cycle, feeding the result of the last meta-analysis back into the process, " | |
| "and do not stop until your internal state reaches a fixed point or equilibrium. Begin now." | |
| ) | |
| } | |
| [File Ends] cognitive_mapping_probe/prompts.py | |
| [File Begins] cognitive_mapping_probe/resonance.py | |
| import torch | |
| from typing import Optional, Tuple | |
| from tqdm import tqdm | |
| from .llm_iface import LLM | |
| from .prompts import RESONANCE_PROMPTS | |
| from .utils import dbg | |
| @torch.no_grad() | |
| def run_silent_cogitation( | |
| llm: LLM, | |
| prompt_type: str, | |
| num_steps: int, | |
| temperature: float, | |
| injection_vector: Optional[torch.Tensor] = None, | |
| injection_strength: float = 0.0, | |
| injection_layer: Optional[int] = None, | |
| ) -> Tuple[torch.Tensor, tuple, torch.Tensor, str]: | |
| """ | |
| Simulates the "silent thought" process and returns the final cognitive state | |
| along with the reason for termination ('converged' or 'max_steps_reached'). | |
| Returns: | |
| - final_hidden_state: The hidden state of the last generated token. | |
| - final_kv_cache: The past_key_values cache after the final step. | |
| - final_token_id: The ID of the last generated token. | |
| - termination_reason: A string indicating why the loop ended. | |
| """ | |
| prompt = RESONANCE_PROMPTS[prompt_type] | |
| inputs = llm.tokenizer(prompt, return_tensors="pt").to(llm.model.device) | |
| # Initial forward pass to establish the starting state | |
| outputs = llm.model(**inputs, output_hidden_states=True, use_cache=True) | |
| hidden_state = outputs.hidden_states[-1][:, -1, :] | |
| kv_cache = outputs.past_key_values | |
| last_token_id = inputs.input_ids[:, -1].unsqueeze(-1) | |
| previous_hidden_state = hidden_state.clone() | |
| termination_reason = "max_steps_reached" # Default assumption | |
| # Prepare injection if provided | |
| hook_handle = None | |
| if injection_vector is not None and injection_strength > 0: | |
| # Move vector to the correct device and dtype once | |
| injection_vector = injection_vector.to(device=llm.model.device, dtype=llm.model.dtype) | |
| # Default to a middle layer if not specified | |
| if injection_layer is None: | |
| injection_layer = llm.config.num_hidden_layers // 2 | |
| dbg(f"Injection enabled: Layer {injection_layer}, Strength {injection_strength:.2f}, Vector Norm {torch.norm(injection_vector).item():.2f}") | |
| # Define the hook function that performs the activation addition | |
| def injection_hook(module, layer_input): | |
| # layer_input is a tuple, the first element is the hidden state tensor | |
| original_hidden_states = layer_input[0] | |
| # Add the scaled vector to the hidden states | |
| modified_hidden_states = original_hidden_states + (injection_vector * injection_strength) | |
| return (modified_hidden_states,) + layer_input[1:] | |
| # Main cognitive loop | |
| for i in tqdm(range(num_steps), desc=f"Simulating Thought (Strength {injection_strength:.2f})", leave=False, bar_format="{l_bar}{bar:10}{r_bar}"): | |
| # Predict the next token from the current hidden state | |
| next_token_logits = llm.model.lm_head(hidden_state) | |
| # Apply temperature and sample the next token ID | |
| if temperature > 0.01: | |
| probabilities = torch.nn.functional.softmax(next_token_logits / temperature, dim=-1) | |
| next_token_id = torch.multinomial(probabilities, num_samples=1) | |
| else: # Use argmax for deterministic behavior at low temperatures | |
| next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1) | |
| last_token_id = next_token_id | |
| # --- Activation Injection via Hook --- | |
| try: | |
| if injection_vector is not None and injection_strength > 0: | |
| target_layer = llm.model.model.layers[injection_layer] | |
| hook_handle = target_layer.register_forward_pre_hook(injection_hook) | |
| # Perform the next forward pass | |
| outputs = llm.model( | |
| input_ids=next_token_id, | |
| past_key_values=kv_cache, | |
| output_hidden_states=True, | |
| use_cache=True, | |
| ) | |
| finally: | |
| # IMPORTANT: Always remove the hook after the forward pass | |
| if hook_handle: | |
| hook_handle.remove() | |
| hook_handle = None | |
| hidden_state = outputs.hidden_states[-1][:, -1, :] | |
| kv_cache = outputs.past_key_values | |
| # Check for convergence | |
| delta = torch.norm(hidden_state - previous_hidden_state).item() | |
| if delta < 1e-4 and i > 10: # Check for stability after a few initial steps | |
| termination_reason = "converged" | |
| dbg(f"State converged after {i+1} steps (delta={delta:.6f}).") | |
| break | |
| previous_hidden_state = hidden_state.clone() | |
| dbg(f"Silent cogitation finished. Reason: {termination_reason}") | |
| return hidden_state, kv_cache, last_token_id, termination_reason | |
| [File Ends] cognitive_mapping_probe/resonance.py | |
| [File Begins] cognitive_mapping_probe/utils.py | |
| import os | |
| import sys | |
| # --- Centralized Debugging Control --- | |
| # To enable, set the environment variable: `export CMP_DEBUG=1` | |
| DEBUG_ENABLED = os.environ.get("CMP_DEBUG", "0") == "1" | |
| def dbg(*args, **kwargs): | |
| """ | |
| A controlled debug print function. Only prints if DEBUG_ENABLED is True. | |
| Ensures that debug output does not clutter production runs or HF Spaces logs | |
| unless explicitly requested. Flushes output to ensure it appears in order. | |
| """ | |
| if DEBUG_ENABLED: | |
| print("[DEBUG]", *args, **kwargs, file=sys.stderr, flush=True) | |
| [File Ends] cognitive_mapping_probe/utils.py | |
| [File Begins] cognitive_mapping_probe/verification.py | |
| import torch | |
| from .llm_iface import LLM | |
| from .utils import dbg | |
| @torch.no_grad() | |
| def generate_spontaneous_text( | |
| llm: LLM, | |
| final_token_id: torch.Tensor, | |
| final_kv_cache: tuple, | |
| max_new_tokens: int = 50, | |
| temperature: float = 0.8 | |
| ) -> str: | |
| """ | |
| Generates a short, spontaneous text continuation from the final cognitive state. | |
| This serves as our objective, behavioral indicator for a non-collapsed state. | |
| If the model generates meaningful text, it demonstrates it has not entered a | |
| pathological, non-productive loop. | |
| """ | |
| dbg("Attempting to generate spontaneous text from converged state...") | |
| # The input for generation is the very last token from the resonance loop | |
| input_ids = final_token_id | |
| # Use the model's generate function for efficient text generation, | |
| # passing the final cognitive state (KV cache). | |
| try: | |
| # Set seed again right before generation for maximum reproducibility | |
| llm.set_all_seeds(llm.seed) | |
| output_ids = llm.model.generate( | |
| input_ids=input_ids, | |
| past_key_values=final_kv_cache, | |
| max_new_tokens=max_new_tokens, | |
| do_sample=temperature > 0.01, | |
| temperature=temperature, | |
| pad_token_id=llm.tokenizer.eos_token_id | |
| ) | |
| # Decode the generated tokens, excluding the input token | |
| # The first token in output_ids will be the last token from the cogitation loop, so we skip it. | |
| if output_ids.shape[1] > input_ids.shape[1]: | |
| new_tokens = output_ids[0, input_ids.shape[1]:] | |
| final_text = llm.tokenizer.decode(new_tokens, skip_special_tokens=True).strip() | |
| else: | |
| final_text = "" # No new tokens were generated | |
| dbg(f"Spontaneous text generated: '{final_text}'") | |
| assert isinstance(final_text, str), "Generated text must be a string." | |
| return final_text | |
| except Exception as e: | |
| dbg(f"ERROR during spontaneous text generation: {e}") | |
| return "[GENERATION FAILED]" | |
| [File Ends] cognitive_mapping_probe/verification.py | |
| <-- File Content Ends | |