Spaces:

UMCU
/

PerplexityViewer

Sleeping

App Files Files Community

PerplexityViewer / MLM_EXPLANATION.md

Bram van Es

bla

ef12530 about 1 month ago

preview code

raw

history blame contribute delete

6.33 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

🎭 MLM Probability Fix - Complete Documentation

Issue Identified

The user correctly observed that changing the MLM probability did not affect the results at all in the encoder model visualization. This was a significant bug in how the MLM probability parameter was being used.

Root Cause Analysis

What Was Wrong

The MLM probability setting had two separate effects that were not properly connected:

Average Perplexity Calculation ✅ (Working correctly)
- Used random masking with the specified MLM probability
- Affected the summary statistic shown to the user
Per-Token Visualization ❌ (Bug was here)
- Always masked each token individually
- Completely ignored the MLM probability setting
- This meant changing MLM probability had no visual effect

The Disconnect

# OLD CODE - MLM probability was ignored for visualization
for i in range(len(tokens)):
    if not special_token:
        # ALWAYS calculated detailed perplexity for every token
        masked_input[0, i] = tokenizer.mask_token_id
        # ... calculate perplexity

The Fix

1. Made MLM Probability Affect Visualization

Now the MLM probability controls which tokens get detailed analysis:

# NEW CODE - MLM probability affects visualization
for i in range(len(tokens)):
    if not special_token:
        if torch.rand(1).item() < mlm_probability:  # ✅ Now respects MLM prob
            # Calculate detailed perplexity for this token
            masked_input[0, i] = tokenizer.mask_token_id
            # ... calculate detailed perplexity
        else:
            # Use baseline perplexity for non-analyzed tokens
            token_perplexities.append(2.0)  # Neutral baseline

2. Visual Distinction

Analyzed tokens: Colored by actual perplexity (green/yellow/red)
Non-analyzed tokens: Gray color with baseline perplexity
Tooltip: Shows whether token was analyzed or not

3. Clear User Feedback

Summary now shows: MLM Probability: 0.15 (3/8 tokens analyzed in detail)
Legend updated: 🟢 Low → 🟡 Medium → 🔴 High → ⚫ Not analyzed
Improved help text: "Probability of detailed analysis per token"

How It Works Now

Low MLM Probability (0.15)

Input: "The capital of France is Paris"
Result: Only ~15% of tokens get detailed analysis
Visualization: Mostly gray tokens with a few colored ones
Effect: Fast analysis, matches BERT training conditions

High MLM Probability (0.5)

Input: "The capital of France is Paris" 
Result: ~50% of tokens get detailed analysis
Visualization: More colored tokens, fewer gray ones
Effect: More comprehensive but slower analysis

User Experience Improvements

Before the Fix

User changes MLM probability from 0.15 → 0.5
No visual change in token colors
Only summary statistic changed (confusing!)

After the Fix

User changes MLM probability from 0.15 → 0.5
More tokens become colored (analyzed)
Fewer tokens remain gray (non-analyzed)
Summary shows token count: "(3/8 tokens analyzed)"
Clear visual feedback of the parameter's effect

Testing the Fix

1. Quick Test

Try the same text with different MLM probabilities:

Text: "Machine learning algorithms require computational resources"
MLM 0.2: Few colored tokens
MLM 0.8: Most tokens colored

2. Demo Script

python mlm_demo.py

Shows exactly how MLM probability affects analysis.

3. Visual Examples

The app now includes example pairs:

Same text with MLM 0.2 vs 0.8
Shows clear visual difference

Technical Details

Randomness Handling

Uses torch.rand() for consistency with PyTorch
Each token gets independent random chance
Reproducible with manual seeds for testing

Baseline Perplexity

Non-analyzed tokens get perplexity = 2.0
This represents "neutral" confidence
Avoids misleading very low/high values

Color Mapping

Analyzed tokens: Full color spectrum based on actual perplexity
Non-analyzed tokens: Gray (rgb(200, 200, 200))
Tooltips distinguish: "Perplexity: 5.2" vs "Not analyzed"

Performance Implications

Lower MLM Probability (0.15)

Pros: Faster, matches BERT training, realistic
Cons: Sparse analysis, some tokens not evaluated

Higher MLM Probability (0.8)

Pros: Comprehensive analysis, more visual information
Cons: Slower computation, unrealistic for MLM

Recommendation

Default 0.15: Standard BERT-like analysis
Increase to 0.3-0.5: For more detailed exploration
Avoid >0.8: Diminishing returns, very slow

Impact on Model Types

Decoder Models (GPT, etc.)

No change: MLM probability only affects encoder models
Always analyze all tokens for next-token prediction

Encoder Models (BERT, etc.)

Major improvement: MLM probability now has clear visual effect
Users can explore different analysis depths
Better understanding of model confidence patterns

User Guidance

When to Use Different MLM Probabilities

0.15 (Standard)

Quick analysis
Matches BERT training
Good for initial exploration

0.3-0.4 (Detailed)

More comprehensive view
Better for understanding difficult texts
Reasonable computation time

0.5+ (Comprehensive)

Maximum detail
Research/analysis purposes
Slower but thorough

Future Enhancements

Possible Improvements

Adaptive MLM: Adjust probability based on text difficulty
Token importance: Prioritize content words over function words
Interactive selection: Let users click tokens to analyze
Batch analysis: Process multiple MLM probabilities simultaneously

Configuration Options

The fix is fully configurable via config.py:

Default MLM probability
Min/max ranges
Baseline perplexity value
Color scheme for non-analyzed tokens

Conclusion

This fix transforms the MLM probability from a "hidden parameter" that only affected summary statistics into a visible, interactive control that directly impacts the visualization. Users now get immediate visual feedback when adjusting MLM probability, making the parameter's purpose clear and the analysis more engaging.

The fix maintains backward compatibility while significantly improving the user experience for encoder model analysis. 🎉