Spaces:

visualisable-ai
/

api

Paused

gary-boon Claude Opus 4.6 (1M context) commited on Apr 6

Commit

10baadf

1 Parent(s): 82349c1

Refactor head classification from cascade to score-all-then-rank

Replace priority-ordered first-match cascade with two-dimension system:

Behaviour type (attention geometry) — all scored simultaneously:
- attention_sink, previous_token, local, induction, focused, diffuse
- Primary = highest qualifying score (with min thresholds), secondary = runner-up
- "focused" and "diffuse" replace weak "positional"/"semantic" catch-alls

Code cue (separate dimension — code token relevance):
- delimiter_sensitive: attention to (){}[]:;,
- keyword_sensitive: attention to language keywords (def, return, if, etc.)
- pattern_reuse: derived from induction signal (repeated spans)
- Each scored by proportion of attention mass on target token class
- Evidence text included (e.g. "34% attention on delimiter sensitive tokens")

Per-head output now includes:
- pattern (primary behaviour type + confidence)
- secondary_behaviour (type + score)
- code_cue (type + score + evidence text)
- secondary_cue (type + score)

Token texts decoded and cached per generation step for code-aware detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

backend/model_service.py +118 -39

backend/model_service.py CHANGED Viewed

@@ -2193,26 +2193,28 @@ async def analyze_research_attention(request: Dict[str, Any], authenticated: boo
                         entropy = 0.0 if math.isnan(entropy) or math.isinf(entropy) else entropy
                         avg_entropy = 0.0 if math.isnan(avg_entropy) or math.isinf(avg_entropy) else avg_entropy
-                        # Data-driven head pattern classification (priority order)
                         seq_len_hw = head_weights.shape[0]
-                        pattern_type = None
-                        confidence = 0.0
-                        # 1. Attention sink: >50% weight on positions 0-2
                         sink_w = head_weights[:min(3, seq_len_hw)].sum().item()
-                        if sink_w > 0.5:
-                            pattern_type = "attention_sink"
-                            confidence = sink_w
-                        # 2. Previous token: sharp focus on immediate predecessor
-                        elif max_weight > 0.9 and head_weights[-2].item() > 0.85:
-                            pattern_type = "previous_token"
-                            confidence = head_weights[-2].item()
-                        # 3. Local: >80% weight within 5 positions of query
-                        elif seq_len_hw > 5 and head_weights[max(0, seq_len_hw - 5):].sum().item() > 0.8:
-                            pattern_type = "local"
-                            confidence = head_weights[max(0, seq_len_hw - 5):].sum().item()
-                        # 4. Induction: attends to positions following previous occurrences of current token
-                        elif step > 0:
                             current_tok = current_ids[0, -1]
                             prev_occ = (current_ids[0, :-1] == current_tok).nonzero(as_tuple=True)[0]
                             if len(prev_occ) > 0:
@@ -2220,24 +2222,81 @@ async def analyze_research_attention(request: Dict[str, Any], authenticated: boo
                                 foll = foll[foll < seq_len_hw]
                                 if len(foll) > 0:
                                     ind_w = head_weights[foll].sum().item()
-                                    if ind_w > 0.3:
-                                        pattern_type = "induction"
-                                        confidence = min(1.0, ind_w)
-                            if pattern_type is None:
-                                if entropy < 1.0:
-                                    pattern_type = "positional"
-                                    confidence = 1.0 - entropy
-                                elif entropy >= 1.0:
-                                    pattern_type = "semantic"
-                                    confidence = min(1.0, 0.5)
-                        # 5. Positional: low entropy, focused attention
-                        elif entropy < 1.0:
-                            pattern_type = "positional"
-                            confidence = 1.0 - entropy
-                        # 6. Semantic: broad attention (fallback)
-                        elif entropy >= 1.0:
-                            pattern_type = "semantic"
-                            confidence = min(1.0, 0.5)
                         # Sanitize confidence
                         confidence = 0.0 if math.isnan(confidence) or math.isinf(confidence) else confidence
@@ -2267,7 +2326,7 @@ async def analyze_research_attention(request: Dict[str, Any], authenticated: boo
                         })
                         # Return only metadata (matrices fetched on-demand via /matrix endpoint)
-                        critical_heads.append({
                             "head_idx": head_idx,
                             "entropy": entropy,
                             "avg_entropy": avg_entropy,  # Averaged over all query positions
@@ -2275,9 +2334,29 @@ async def analyze_research_attention(request: Dict[str, Any], authenticated: boo
                             "has_matrices": attention_matrix is not None,  # Flag for frontend
                             "pattern": {
                                 "type": pattern_type,
-                                "confidence": confidence
-                            } if pattern_type else None
-                        })
                     # Sort by max_weight (return all heads, frontend will decide how many to display)
                     critical_heads.sort(key=lambda h: h["max_weight"], reverse=True)

                         entropy = 0.0 if math.isnan(entropy) or math.isinf(entropy) else entropy
                         avg_entropy = 0.0 if math.isnan(avg_entropy) or math.isinf(avg_entropy) else avg_entropy
+                        # Score-all-then-rank head classification
+                        # Two dimensions: behaviour type (attention geometry) + code cue (token relevance)
                         seq_len_hw = head_weights.shape[0]
+                        # --- Behaviour type scores (attention geometry) ---
+                        behaviour_scores = {}
+                        # Attention sink: weight on positions 0-2
                         sink_w = head_weights[:min(3, seq_len_hw)].sum().item()
+                        behaviour_scores["attention_sink"] = sink_w
+                        # Previous token: weight on immediate predecessor
+                        prev_tok_w = head_weights[-2].item() if seq_len_hw >= 2 else 0.0
+                        behaviour_scores["previous_token"] = prev_tok_w
+                        # Local: weight within last 5 positions
+                        local_w = head_weights[max(0, seq_len_hw - 5):].sum().item() if seq_len_hw > 5 else 0.0
+                        behaviour_scores["local"] = local_w
+                        # Induction: weight on positions following previous occurrences of current token
+                        ind_w = 0.0
+                        if step > 0 and seq_len_hw > 1:
                             current_tok = current_ids[0, -1]
                             prev_occ = (current_ids[0, :-1] == current_tok).nonzero(as_tuple=True)[0]
                             if len(prev_occ) > 0:
                                 foll = foll[foll < seq_len_hw]
                                 if len(foll) > 0:
                                     ind_w = head_weights[foll].sum().item()
+                        behaviour_scores["induction"] = min(1.0, ind_w)
+                        # Focused: low entropy, concentrated attention (not captured by above)
+                        focused_score = max(0.0, 1.0 - entropy) if entropy < 1.5 else 0.0
+                        behaviour_scores["focused"] = focused_score
+                        # Diffuse: high entropy, broad attention
+                        diffuse_score = min(1.0, max(0.0, (entropy - 1.0) / 2.0))
+                        behaviour_scores["diffuse"] = diffuse_score
+                        # Pick primary behaviour (highest score, with minimum thresholds)
+                        behaviour_thresholds = {
+                            "attention_sink": 0.4,
+                            "previous_token": 0.7,
+                            "local": 0.5,
+                            "induction": 0.2,
+                            "focused": 0.3,
+                            "diffuse": 0.3,
+                        }
+                        qualified_behaviours = {
+                            k: v for k, v in behaviour_scores.items()
+                            if v >= behaviour_thresholds.get(k, 0.3)
+                        }
+                        sorted_behaviours = sorted(qualified_behaviours.items(), key=lambda x: x[1], reverse=True)
+                        primary_behaviour = sorted_behaviours[0] if sorted_behaviours else ("diffuse", diffuse_score)
+                        secondary_behaviour = sorted_behaviours[1] if len(sorted_behaviours) > 1 else None
+                        pattern_type = primary_behaviour[0]
+                        confidence = primary_behaviour[1]
+                        # --- Code cue scores (what code tokens are attended to) ---
+                        # Decode token texts for code-aware detection (cached per step)
+                        if not hasattr(self, '_step_token_texts') or self._step_token_texts_step != step:
+                            try:
+                                self._step_token_texts = [
+                                    manager.tokenizer.decode([tid]) for tid in current_ids[0, :seq_len_hw].tolist()
+                                ]
+                            except Exception:
+                                self._step_token_texts = []
+                            self._step_token_texts_step = step
+                        token_texts = self._step_token_texts
+                        code_cues = {}
+                        if len(token_texts) == seq_len_hw:
+                            # Delimiter-sensitive: attention to brackets, braces, parens
+                            delimiters = {'(', ')', '{', '}', '[', ']', ':', ';', ','}
+                            delim_indices = [i for i, t in enumerate(token_texts) if t.strip() in delimiters]
+                            if delim_indices:
+                                delim_w = head_weights[delim_indices].sum().item()
+                                code_cues["delimiter_sensitive"] = delim_w
+                            # Keyword-sensitive: attention to language keywords
+                            keywords = {'def', 'return', 'if', 'else', 'elif', 'for', 'while', 'class',
+                                       'import', 'from', 'try', 'except', 'with', 'as', 'in', 'not',
+                                       'and', 'or', 'True', 'False', 'None', 'self', 'yield', 'async',
+                                       'await', 'lambda', 'raise', 'pass', 'break', 'continue',
+                                       'function', 'const', 'let', 'var', 'new', 'this'}
+                            kw_indices = [i for i, t in enumerate(token_texts) if t.strip() in keywords]
+                            if kw_indices:
+                                kw_w = head_weights[kw_indices].sum().item()
+                                code_cues["keyword_sensitive"] = kw_w
+                            # Pattern reuse: attention to a contiguous span that appeared earlier
+                            # (broader than induction — checks for repeated multi-token sequences)
+                            if ind_w > 0.15:
+                                code_cues["pattern_reuse"] = min(1.0, ind_w * 1.5)
+                        # Filter code cues by minimum threshold
+                        code_cue_threshold = 0.15
+                        qualified_cues = {
+                            k: round(v, 4) for k, v in code_cues.items()
+                            if v >= code_cue_threshold
+                        }
+                        sorted_cues = sorted(qualified_cues.items(), key=lambda x: x[1], reverse=True)
+                        primary_cue = sorted_cues[0] if sorted_cues else None
                         # Sanitize confidence
                         confidence = 0.0 if math.isnan(confidence) or math.isinf(confidence) else confidence
                         })
                         # Return only metadata (matrices fetched on-demand via /matrix endpoint)
+                        head_entry = {
                             "head_idx": head_idx,
                             "entropy": entropy,
                             "avg_entropy": avg_entropy,  # Averaged over all query positions
                             "has_matrices": attention_matrix is not None,  # Flag for frontend
                             "pattern": {
                                 "type": pattern_type,
+                                "confidence": round(confidence, 4),
+                            } if pattern_type else None,
+                        }
+                        # Secondary behaviour (if present and distinct from primary)
+                        if secondary_behaviour:
+                            head_entry["secondary_behaviour"] = {
+                                "type": secondary_behaviour[0],
+                                "score": round(secondary_behaviour[1], 4),
+                            }
+                        # Code cue (separate dimension from behaviour type)
+                        if primary_cue:
+                            head_entry["code_cue"] = {
+                                "type": primary_cue[0],
+                                "score": round(primary_cue[1], 4),
+                                "evidence": f"{round(primary_cue[1] * 100)}% attention on {primary_cue[0].replace('_', ' ')} tokens",
+                            }
+                            # Secondary code cue
+                            if len(sorted_cues) > 1:
+                                head_entry["secondary_cue"] = {
+                                    "type": sorted_cues[1][0],
+                                    "score": round(sorted_cues[1][1], 4),
+                                }
+                        critical_heads.append(head_entry)
                     # Sort by max_weight (return all heads, frontend will decide how many to display)
                     critical_heads.sort(key=lambda h: h["max_weight"], reverse=True)