Spaces:

vimalk78
/

abc123

Sleeping

App Files Files Community

abc123 / crossword-app /backend-py /docs /distribution_normalization_proposal.md

vimalk78

hack: experiments for improving clue generation

2ecccdf 4 months ago

preview code

raw

history blame contribute delete

9.35 kB

Distribution Normalization for Debug Visualization

Executive Summary

Currently, probability distributions in the debug tab vary in position and shape based on the selected topic, making it difficult to assess the effectiveness of difficulty-based Gaussian targeting across different themes. This document proposes implementing distribution normalization to create consistent, topic-independent visualizations that clearly reveal algorithmic behavior.

Current Problem

Topic-Dependent Distribution Shifts

The current visualization shows probability distributions that vary significantly based on the input topic:

Topic: "animals"     → Peak around position 60-80
Topic: "technology"  → Peak around position 30-50  
Topic: "history"     → Peak around position 40-70

This variation occurs because different topics produce different ranges of similarity scores:

High-similarity topics (e.g., "technology" → "TECH") compress the distribution leftward
Lower-similarity topics spread the distribution more broadly
The Gaussian frequency targeting gets masked by these topic-specific effects

Visualization Challenges

Inconsistent Baselines: Each topic creates a different baseline probability distribution
Difficult Comparison: Cannot easily compare difficulty effectiveness across topics
Masked Patterns: The intended Gaussian targeting patterns get obscured by topic bias
Misleading Statistics: Mean (μ) and sigma (σ) positions vary dramatically between topics

Benefits of Normalization

1. Consistent Difficulty Targeting Visualization

With normalization, each difficulty level would show:

Easy Mode: Always peaks at the same visual position (90th percentile zone)
Medium Mode: Always centers around 50th percentile zone
Hard Mode: Always concentrates in 20th percentile zone

2. Topic-Independent Analysis

Normalized View:
Easy (animals):     ████▌░░░░░░░░░░░░ (peak at 90%)
Easy (technology):  ████▌░░░░░░░░░░░░ (peak at 90%)
Easy (history):     ████▌░░░░░░░░░░░░ (peak at 90%)

All topics would produce visually identical patterns for the same difficulty level.

3. Enhanced Diagnostic Capability

Immediately spot when Gaussian targeting is failing
Compare algorithm performance across different topic domains
Validate that composite scoring weights are working correctly
Identify topics that produce unusual similarity score distributions

Implementation Strategies

Option 1: Min-Max Normalization (Recommended)

Formula:

normalized_probability = (probability - min_prob) / (max_prob - min_prob)

Benefits:

Preserves relative probability relationships
Maps all distributions to [0, 1] range
Simple to implement and understand
Maintains the shape of the original distribution

Implementation:

def normalize_probability_distribution(probabilities):
    probs = [p["probability"] for p in probabilities]
    min_prob, max_prob = min(probs), max(probs)
    
    if max_prob == min_prob:  # Handle edge case
        return probabilities
    
    for item in probabilities:
        item["normalized_probability"] = (
            item["probability"] - min_prob
        ) / (max_prob - min_prob)
    
    return probabilities

Option 2: Z-Score Normalization

Formula:

normalized = (probability - mean_prob) / std_dev_prob

Benefits:

Centers all distributions around 0
Shows standard deviations from mean
Good for statistical analysis

Drawbacks:

Negative values can be confusing in UI
Requires additional explanation for users

Option 3: Percentile Rank Normalization

Formula:

normalized = percentile_rank(probability, all_probabilities) / 100

Benefits:

Maps to [0, 1] range based on rank
Emphasizes relative positioning
Less sensitive to outliers

Drawbacks:

Loses information about absolute probability differences
Can flatten important distinctions

Visual Impact Examples

Before Normalization (Current State)

Animals Easy:     ░░░░░██████▌░░░░░░░░ (peak at position 60)
Tech Easy:        ░██████▌░░░░░░░░░░░░ (peak at position 30)
History Easy:     ░░░██████▌░░░░░░░░░░ (peak at position 45)

After Normalization (Proposed)

Animals Easy:     ░░░░░░░░░██████▌░░░░ (normalized peak at 90%)
Tech Easy:        ░░░░░░░░░██████▌░░░░ (normalized peak at 90%)
History Easy:     ░░░░░░░░░██████▌░░░░ (normalized peak at 90%)

Recommended Implementation Approach

Phase 1: Data Collection Enhancement

Modify the backend to include normalization data:

# In thematic_word_service.py _softmax_weighted_selection()
prob_distribution = {
    "probabilities": probability_data,
    "raw_stats": {
        "min_probability": min_prob,
        "max_probability": max_prob, 
        "mean_probability": mean_prob,
        "std_probability": std_prob
    },
    "normalized_probabilities": normalized_data
}

Phase 2: Frontend Visualization Options

Add toggle buttons in the debug tab:

Raw Distribution: Current behavior (for debugging)
Normalized Distribution: New normalized view (for analysis)
Side-by-Side: Show both for comparison

Phase 3: Enhanced Statistical Markers

With normalization, the statistical markers (μ, σ) become more meaningful:

μ should consistently align with difficulty targets (20%, 50%, 90%)
σ should show consistent widths across topics for the same difficulty
Deviations from expected positions indicate algorithmic issues

Expected Outcomes

Successful Implementation Indicators

Visual Consistency: All easy mode distributions peak at the same normalized position
Clear Difficulty Separation: Easy, Medium, Hard show distinct, predictable patterns
Topic Independence: Changing topics doesn't change the distribution shape/position
Diagnostic Power: Algorithm issues become immediately obvious

Validation Tests

# Test cases to validate normalization
test_cases = [
    ("animals", "easy"),
    ("technology", "easy"), 
    ("history", "easy"),
    # Should all produce identical normalized distributions
]

for topic, difficulty in test_cases:
    distribution = generate_normalized_distribution(topic, difficulty)
    assert peak_position(distribution) == EXPECTED_EASY_PEAK
    assert distribution_width(distribution) == EXPECTED_EASY_WIDTH

Implementation Timeline

Week 1: Backend Changes

Modify _softmax_weighted_selection() to compute normalization statistics
Add normalized probability calculation
Update debug data structure
Add unit tests

Week 2: Frontend Integration

Add normalization toggle to debug tab
Implement normalized chart rendering
Update statistical marker calculations
Add explanatory tooltips

Week 3: Testing & Validation

Test across multiple topics and difficulties
Validate that normalization reveals expected patterns
Document findings and create examples
Performance optimization if needed

Future Enhancements

Dynamic Normalization Scopes

Per-topic normalization: Normalize within each topic separately
Cross-topic normalization: Normalize across all topics globally
Per-difficulty normalization: Normalize within difficulty levels

Advanced Statistical Views

Overlay comparisons: Show multiple topics/difficulties on same chart
Animation: Transition between raw and normalized views
Heatmap visualization: Show 2D difficulty×topic probability landscapes

Risk Mitigation

Potential Issues

Information Loss: Normalization might hide important absolute differences
User Confusion: Additional complexity in the interface
Performance: Extra computation for large datasets

Mitigation Strategies

Always provide raw view option: Never remove the original visualization
Clear labeling: Explicitly indicate when normalization is active
Efficient algorithms: Use vectorized operations for normalization

Conclusion

Distribution normalization will transform the debug visualization from a topic-specific diagnostic tool into a universal algorithm validation system. By removing topic-dependent bias, we can clearly see whether the Gaussian frequency targeting is working as designed, regardless of the input theme.

The recommended min-max normalization approach preserves the essential characteristics of the probability distributions while ensuring consistent, comparable visualizations across all topics and difficulties.

This enhancement will significantly improve the ability to:

Validate algorithm correctness
Debug difficulty-targeting issues
Compare performance across different domains
Demonstrate the effectiveness of the composite scoring system

This proposal builds on the successful percentile-sorted visualization implementation to create an even more powerful debugging and analysis tool.