Distribution Normalization for Debug Visualization
Executive Summary
Currently, probability distributions in the debug tab vary in position and shape based on the selected topic, making it difficult to assess the effectiveness of difficulty-based Gaussian targeting across different themes. This document proposes implementing distribution normalization to create consistent, topic-independent visualizations that clearly reveal algorithmic behavior.
Current Problem
Topic-Dependent Distribution Shifts
The current visualization shows probability distributions that vary significantly based on the input topic:
Topic: "animals" β Peak around position 60-80
Topic: "technology" β Peak around position 30-50
Topic: "history" β Peak around position 40-70
This variation occurs because different topics produce different ranges of similarity scores:
- High-similarity topics (e.g., "technology" β "TECH") compress the distribution leftward
- Lower-similarity topics spread the distribution more broadly
- The Gaussian frequency targeting gets masked by these topic-specific effects
Visualization Challenges
- Inconsistent Baselines: Each topic creates a different baseline probability distribution
- Difficult Comparison: Cannot easily compare difficulty effectiveness across topics
- Masked Patterns: The intended Gaussian targeting patterns get obscured by topic bias
- Misleading Statistics: Mean (ΞΌ) and sigma (Ο) positions vary dramatically between topics
Benefits of Normalization
1. Consistent Difficulty Targeting Visualization
With normalization, each difficulty level would show:
- Easy Mode: Always peaks at the same visual position (90th percentile zone)
- Medium Mode: Always centers around 50th percentile zone
- Hard Mode: Always concentrates in 20th percentile zone
2. Topic-Independent Analysis
Normalized View:
Easy (animals): βββββββββββββββββ (peak at 90%)
Easy (technology): βββββββββββββββββ (peak at 90%)
Easy (history): βββββββββββββββββ (peak at 90%)
All topics would produce visually identical patterns for the same difficulty level.
3. Enhanced Diagnostic Capability
- Immediately spot when Gaussian targeting is failing
- Compare algorithm performance across different topic domains
- Validate that composite scoring weights are working correctly
- Identify topics that produce unusual similarity score distributions
Implementation Strategies
Option 1: Min-Max Normalization (Recommended)
Formula:
normalized_probability = (probability - min_prob) / (max_prob - min_prob)
Benefits:
- Preserves relative probability relationships
- Maps all distributions to [0, 1] range
- Simple to implement and understand
- Maintains the shape of the original distribution
Implementation:
def normalize_probability_distribution(probabilities):
probs = [p["probability"] for p in probabilities]
min_prob, max_prob = min(probs), max(probs)
if max_prob == min_prob: # Handle edge case
return probabilities
for item in probabilities:
item["normalized_probability"] = (
item["probability"] - min_prob
) / (max_prob - min_prob)
return probabilities
Option 2: Z-Score Normalization
Formula:
normalized = (probability - mean_prob) / std_dev_prob
Benefits:
- Centers all distributions around 0
- Shows standard deviations from mean
- Good for statistical analysis
Drawbacks:
- Negative values can be confusing in UI
- Requires additional explanation for users
Option 3: Percentile Rank Normalization
Formula:
normalized = percentile_rank(probability, all_probabilities) / 100
Benefits:
- Maps to [0, 1] range based on rank
- Emphasizes relative positioning
- Less sensitive to outliers
Drawbacks:
- Loses information about absolute probability differences
- Can flatten important distinctions
Visual Impact Examples
Before Normalization (Current State)
Animals Easy: ββββββββββββββββββββ (peak at position 60)
Tech Easy: ββββββββββββββββββββ (peak at position 30)
History Easy: ββββββββββββββββββββ (peak at position 45)
After Normalization (Proposed)
Animals Easy: ββββββββββββββββββββ (normalized peak at 90%)
Tech Easy: ββββββββββββββββββββ (normalized peak at 90%)
History Easy: ββββββββββββββββββββ (normalized peak at 90%)
Recommended Implementation Approach
Phase 1: Data Collection Enhancement
Modify the backend to include normalization data:
# In thematic_word_service.py _softmax_weighted_selection()
prob_distribution = {
"probabilities": probability_data,
"raw_stats": {
"min_probability": min_prob,
"max_probability": max_prob,
"mean_probability": mean_prob,
"std_probability": std_prob
},
"normalized_probabilities": normalized_data
}
Phase 2: Frontend Visualization Options
Add toggle buttons in the debug tab:
- Raw Distribution: Current behavior (for debugging)
- Normalized Distribution: New normalized view (for analysis)
- Side-by-Side: Show both for comparison
Phase 3: Enhanced Statistical Markers
With normalization, the statistical markers (ΞΌ, Ο) become more meaningful:
- ΞΌ should consistently align with difficulty targets (20%, 50%, 90%)
- Ο should show consistent widths across topics for the same difficulty
- Deviations from expected positions indicate algorithmic issues
Expected Outcomes
Successful Implementation Indicators
- Visual Consistency: All easy mode distributions peak at the same normalized position
- Clear Difficulty Separation: Easy, Medium, Hard show distinct, predictable patterns
- Topic Independence: Changing topics doesn't change the distribution shape/position
- Diagnostic Power: Algorithm issues become immediately obvious
Validation Tests
# Test cases to validate normalization
test_cases = [
("animals", "easy"),
("technology", "easy"),
("history", "easy"),
# Should all produce identical normalized distributions
]
for topic, difficulty in test_cases:
distribution = generate_normalized_distribution(topic, difficulty)
assert peak_position(distribution) == EXPECTED_EASY_PEAK
assert distribution_width(distribution) == EXPECTED_EASY_WIDTH
Implementation Timeline
Week 1: Backend Changes
- Modify
_softmax_weighted_selection()to compute normalization statistics - Add normalized probability calculation
- Update debug data structure
- Add unit tests
Week 2: Frontend Integration
- Add normalization toggle to debug tab
- Implement normalized chart rendering
- Update statistical marker calculations
- Add explanatory tooltips
Week 3: Testing & Validation
- Test across multiple topics and difficulties
- Validate that normalization reveals expected patterns
- Document findings and create examples
- Performance optimization if needed
Future Enhancements
Dynamic Normalization Scopes
- Per-topic normalization: Normalize within each topic separately
- Cross-topic normalization: Normalize across all topics globally
- Per-difficulty normalization: Normalize within difficulty levels
Advanced Statistical Views
- Overlay comparisons: Show multiple topics/difficulties on same chart
- Animation: Transition between raw and normalized views
- Heatmap visualization: Show 2D difficultyΓtopic probability landscapes
Risk Mitigation
Potential Issues
- Information Loss: Normalization might hide important absolute differences
- User Confusion: Additional complexity in the interface
- Performance: Extra computation for large datasets
Mitigation Strategies
- Always provide raw view option: Never remove the original visualization
- Clear labeling: Explicitly indicate when normalization is active
- Efficient algorithms: Use vectorized operations for normalization
Conclusion
Distribution normalization will transform the debug visualization from a topic-specific diagnostic tool into a universal algorithm validation system. By removing topic-dependent bias, we can clearly see whether the Gaussian frequency targeting is working as designed, regardless of the input theme.
The recommended min-max normalization approach preserves the essential characteristics of the probability distributions while ensuring consistent, comparable visualizations across all topics and difficulties.
This enhancement will significantly improve the ability to:
- Validate algorithm correctness
- Debug difficulty-targeting issues
- Compare performance across different domains
- Demonstrate the effectiveness of the composite scoring system
This proposal builds on the successful percentile-sorted visualization implementation to create an even more powerful debugging and analysis tool.