abc123 / crossword-app /backend-py /docs /distribution_normalization_proposal.md
vimalk78's picture
hack: experiments for improving clue generation
2ecccdf

Distribution Normalization for Debug Visualization

Executive Summary

Currently, probability distributions in the debug tab vary in position and shape based on the selected topic, making it difficult to assess the effectiveness of difficulty-based Gaussian targeting across different themes. This document proposes implementing distribution normalization to create consistent, topic-independent visualizations that clearly reveal algorithmic behavior.

Current Problem

Topic-Dependent Distribution Shifts

The current visualization shows probability distributions that vary significantly based on the input topic:

Topic: "animals"     β†’ Peak around position 60-80
Topic: "technology"  β†’ Peak around position 30-50  
Topic: "history"     β†’ Peak around position 40-70

This variation occurs because different topics produce different ranges of similarity scores:

  • High-similarity topics (e.g., "technology" β†’ "TECH") compress the distribution leftward
  • Lower-similarity topics spread the distribution more broadly
  • The Gaussian frequency targeting gets masked by these topic-specific effects

Visualization Challenges

  1. Inconsistent Baselines: Each topic creates a different baseline probability distribution
  2. Difficult Comparison: Cannot easily compare difficulty effectiveness across topics
  3. Masked Patterns: The intended Gaussian targeting patterns get obscured by topic bias
  4. Misleading Statistics: Mean (ΞΌ) and sigma (Οƒ) positions vary dramatically between topics

Benefits of Normalization

1. Consistent Difficulty Targeting Visualization

With normalization, each difficulty level would show:

  • Easy Mode: Always peaks at the same visual position (90th percentile zone)
  • Medium Mode: Always centers around 50th percentile zone
  • Hard Mode: Always concentrates in 20th percentile zone

2. Topic-Independent Analysis

Normalized View:
Easy (animals):     β–ˆβ–ˆβ–ˆβ–ˆβ–Œβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ (peak at 90%)
Easy (technology):  β–ˆβ–ˆβ–ˆβ–ˆβ–Œβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ (peak at 90%)
Easy (history):     β–ˆβ–ˆβ–ˆβ–ˆβ–Œβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ (peak at 90%)

All topics would produce visually identical patterns for the same difficulty level.

3. Enhanced Diagnostic Capability

  • Immediately spot when Gaussian targeting is failing
  • Compare algorithm performance across different topic domains
  • Validate that composite scoring weights are working correctly
  • Identify topics that produce unusual similarity score distributions

Implementation Strategies

Option 1: Min-Max Normalization (Recommended)

Formula:

normalized_probability = (probability - min_prob) / (max_prob - min_prob)

Benefits:

  • Preserves relative probability relationships
  • Maps all distributions to [0, 1] range
  • Simple to implement and understand
  • Maintains the shape of the original distribution

Implementation:

def normalize_probability_distribution(probabilities):
    probs = [p["probability"] for p in probabilities]
    min_prob, max_prob = min(probs), max(probs)
    
    if max_prob == min_prob:  # Handle edge case
        return probabilities
    
    for item in probabilities:
        item["normalized_probability"] = (
            item["probability"] - min_prob
        ) / (max_prob - min_prob)
    
    return probabilities

Option 2: Z-Score Normalization

Formula:

normalized = (probability - mean_prob) / std_dev_prob

Benefits:

  • Centers all distributions around 0
  • Shows standard deviations from mean
  • Good for statistical analysis

Drawbacks:

  • Negative values can be confusing in UI
  • Requires additional explanation for users

Option 3: Percentile Rank Normalization

Formula:

normalized = percentile_rank(probability, all_probabilities) / 100

Benefits:

  • Maps to [0, 1] range based on rank
  • Emphasizes relative positioning
  • Less sensitive to outliers

Drawbacks:

  • Loses information about absolute probability differences
  • Can flatten important distinctions

Visual Impact Examples

Before Normalization (Current State)

Animals Easy:     β–‘β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ (peak at position 60)
Tech Easy:        β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ (peak at position 30)
History Easy:     β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ (peak at position 45)

After Normalization (Proposed)

Animals Easy:     β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œβ–‘β–‘β–‘β–‘ (normalized peak at 90%)
Tech Easy:        β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œβ–‘β–‘β–‘β–‘ (normalized peak at 90%)
History Easy:     β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œβ–‘β–‘β–‘β–‘ (normalized peak at 90%)

Recommended Implementation Approach

Phase 1: Data Collection Enhancement

Modify the backend to include normalization data:

# In thematic_word_service.py _softmax_weighted_selection()
prob_distribution = {
    "probabilities": probability_data,
    "raw_stats": {
        "min_probability": min_prob,
        "max_probability": max_prob, 
        "mean_probability": mean_prob,
        "std_probability": std_prob
    },
    "normalized_probabilities": normalized_data
}

Phase 2: Frontend Visualization Options

Add toggle buttons in the debug tab:

  • Raw Distribution: Current behavior (for debugging)
  • Normalized Distribution: New normalized view (for analysis)
  • Side-by-Side: Show both for comparison

Phase 3: Enhanced Statistical Markers

With normalization, the statistical markers (ΞΌ, Οƒ) become more meaningful:

  • ΞΌ should consistently align with difficulty targets (20%, 50%, 90%)
  • Οƒ should show consistent widths across topics for the same difficulty
  • Deviations from expected positions indicate algorithmic issues

Expected Outcomes

Successful Implementation Indicators

  1. Visual Consistency: All easy mode distributions peak at the same normalized position
  2. Clear Difficulty Separation: Easy, Medium, Hard show distinct, predictable patterns
  3. Topic Independence: Changing topics doesn't change the distribution shape/position
  4. Diagnostic Power: Algorithm issues become immediately obvious

Validation Tests

# Test cases to validate normalization
test_cases = [
    ("animals", "easy"),
    ("technology", "easy"), 
    ("history", "easy"),
    # Should all produce identical normalized distributions
]

for topic, difficulty in test_cases:
    distribution = generate_normalized_distribution(topic, difficulty)
    assert peak_position(distribution) == EXPECTED_EASY_PEAK
    assert distribution_width(distribution) == EXPECTED_EASY_WIDTH

Implementation Timeline

Week 1: Backend Changes

  • Modify _softmax_weighted_selection() to compute normalization statistics
  • Add normalized probability calculation
  • Update debug data structure
  • Add unit tests

Week 2: Frontend Integration

  • Add normalization toggle to debug tab
  • Implement normalized chart rendering
  • Update statistical marker calculations
  • Add explanatory tooltips

Week 3: Testing & Validation

  • Test across multiple topics and difficulties
  • Validate that normalization reveals expected patterns
  • Document findings and create examples
  • Performance optimization if needed

Future Enhancements

Dynamic Normalization Scopes

  • Per-topic normalization: Normalize within each topic separately
  • Cross-topic normalization: Normalize across all topics globally
  • Per-difficulty normalization: Normalize within difficulty levels

Advanced Statistical Views

  • Overlay comparisons: Show multiple topics/difficulties on same chart
  • Animation: Transition between raw and normalized views
  • Heatmap visualization: Show 2D difficultyΓ—topic probability landscapes

Risk Mitigation

Potential Issues

  1. Information Loss: Normalization might hide important absolute differences
  2. User Confusion: Additional complexity in the interface
  3. Performance: Extra computation for large datasets

Mitigation Strategies

  1. Always provide raw view option: Never remove the original visualization
  2. Clear labeling: Explicitly indicate when normalization is active
  3. Efficient algorithms: Use vectorized operations for normalization

Conclusion

Distribution normalization will transform the debug visualization from a topic-specific diagnostic tool into a universal algorithm validation system. By removing topic-dependent bias, we can clearly see whether the Gaussian frequency targeting is working as designed, regardless of the input theme.

The recommended min-max normalization approach preserves the essential characteristics of the probability distributions while ensuring consistent, comparable visualizations across all topics and difficulties.

This enhancement will significantly improve the ability to:

  • Validate algorithm correctness
  • Debug difficulty-targeting issues
  • Compare performance across different domains
  • Demonstrate the effectiveness of the composite scoring system

This proposal builds on the successful percentile-sorted visualization implementation to create an even more powerful debugging and analysis tool.