Spaces:
No application file
Cache System Migration Guide
🎯 TL;DR - What Changed
Auto-versioning is now ENABLED BY DEFAULT.
Your cache will automatically invalidate when function code changes. This prevents stale cache bugs.
Most users need to do nothing - just update and enjoy automatic cache invalidation.
Only opt-out if:
- Function takes hours/days to compute AND
- Function is stable/won't change AND
- You understand the risk of stale results
Overview of Changes
auto_versioning=Trueby default: Cache keys include function source hash- One decorator to rule them all:
@cacheable()replaces multiple decorators - Removed
smart_cacheable: Now redundant (built into default behavior) - Selective cleaner refocused: Maintenance tool for orphaned caches
Quick Migration Table
| Old Code | New Code | Notes |
|---|---|---|
@robust_cacheable |
@cacheable() |
Now has auto-versioning by default |
@time_aware_cacheable |
@cacheable(time_aware=True) |
Now has auto-versioning by default |
@cv_cacheable |
@cacheable() |
Now has auto-versioning by default |
@smart_cacheable |
@cacheable() |
REMOVED - now default behavior |
@cacheable() (old) |
@cacheable(auto_versioning=False) |
Only if you need old behavior |
What is Auto-Versioning?
The Problem It Solves
# Without auto-versioning
@cacheable(auto_versioning=False)
def calculate_returns(prices):
return prices.pct_change()
calculate_returns(df) # Cache miss, stores result
calculate_returns(df) # Cache hit ✓
# Developer fixes a bug
def calculate_returns(prices):
return prices.pct_change().fillna(0) # Bug fix!
calculate_returns(df) # Cache HIT - WRONG RESULT! ❌
# Returns OLD buggy result from cache
With Auto-Versioning (Now Default)
# With auto-versioning (NEW DEFAULT)
@cacheable() # auto_versioning=True by default
def calculate_returns(prices):
return prices.pct_change()
calculate_returns(df) # Cache miss, stores at key "v_abc123..."
calculate_returns(df) # Cache hit ✓
# Developer fixes bug
def calculate_returns(prices):
return prices.pct_change().fillna(0) # Bug fix!
calculate_returns(df) # Cache MISS - new key "v_def456..." ✓
# Computes with NEW correct code
Migration Steps
Step 1: Update smart_cacheable (REQUIRED)
Old code:
from afml.cache import smart_cacheable
@smart_cacheable
def my_function(data):
return data.mean()
New code:
from afml.cache import cacheable
@cacheable() # That's it! auto_versioning is now default
def my_function(data):
return data.mean()
Step 2: Review Expensive Functions (OPTIONAL)
If you have functions that take hours to compute and rarely change:
@cacheable(auto_versioning=False) # Explicit opt-out
def train_huge_model(data):
"""Takes 48 hours, changes once per year"""
return expensive_training(data)
⚠️ Warning: With auto_versioning=False, adding a comment invalidates cache:
@cacheable(auto_versioning=False)
def train_huge_model(data):
"""Added this docstring""" # THIS CHANGE WON'T INVALIDATE CACHE
return expensive_training(data) # May return stale result!
Step 3: Clean Up Old Caches (RECOMMENDED)
After migration, clean up orphaned caches:
from afml.cache import cache_maintenance
# One-time cleanup after migration
cache_maintenance(
clean_orphaned=True,
max_cache_size_mb=1000,
max_age_days=30
)
Understanding Auto-Versioning Behavior
How Cache Keys Work
Without auto-versioning:
cache_key = md5("module.function_name" + "arg_hashes")
= "a1b2c3d4..."
With auto-versioning (default):
cache_key = md5("module.function_name" + "v_abc123" + "arg_hashes")
^^^^^^^^^^
function source hash
= "e5f6g7h8..." # Different key!
When Cache Invalidates
Cache invalidates when:
- ✅ Function body changes
- ✅ Function name changes
- ✅ Default parameters change
- ✅ Decorators change
- ❌ Comments change (graceful: uses file mtime as fallback)
- ❌ Docstrings change (graceful: uses file mtime as fallback)
Graceful Fallback
For built-in/dynamic functions where source is unavailable:
# Can't get source for built-ins
import numpy as np
@cacheable() # Gracefully falls back to file mtime
def use_builtin(data):
return np.mean(data) # np.mean has no source
# Warning logged, but doesn't crash
Common Scenarios
Scenario 1: Development (Default - No Changes Needed)
from afml.cache import cacheable
@cacheable() # Just use defaults!
def my_feature(data, window):
"""Feature under active development"""
return data.rolling(window).mean()
# Work normally - cache auto-invalidates on changes
result1 = my_feature(df, 20)
result2 = my_feature(df, 20) # Cache hit
# ... modify my_feature ...
result3 = my_feature(df, 20) # Cache miss (automatic!)
Scenario 2: Expensive Computation (Explicit Opt-Out)
from afml.cache import cacheable
@cacheable(auto_versioning=False) # Explicit opt-out
def train_production_model(data):
"""Takes 24 hours, changes rarely, want to preserve cache"""
return expensive_training(data)
Scenario 3: Bulk Opt-Out for Stable Functions
from afml.cache import disable_auto_versioning
# Create custom decorator without versioning
cacheable_stable = disable_auto_versioning()
@cacheable_stable()
def stable_func_1(data): ...
@cacheable_stable()
def stable_func_2(data): ...
@cacheable_stable(time_aware=True) # Can combine with other options
def stable_func_3(data): ...
Scenario 4: Mixed Strategy
from afml.cache import cacheable
# Under development - auto-versioning
@cacheable()
def experimental_feature(data):
return data.ewm(span=20).mean()
# Production stable - opt-out
@cacheable(auto_versioning=False)
def load_data(symbol, start, end):
return expensive_data_load(symbol, start, end)
Maintenance & Cleanup
Periodic Cleanup (Recommended)
Set up weekly/monthly cleanup:
from afml.cache import cache_maintenance
# Run weekly via cron/scheduler
cache_maintenance(
clean_orphaned=True, # Remove old function versions
max_cache_size_mb=2000, # Enforce size limit
max_age_days=90, # Remove very old caches
min_orphan_age_hours=48 # Keep recent orphans (grace period)
)
Analyze Cache Fragmentation
Check if auto-versioning is creating too many versions:
from afml.cache import print_version_analysis
print_version_analysis()
# Output:
# ========================================
# CACHE VERSION ANALYSIS
# ========================================
# Functions with versions: 12
# Total versions: 34
# Total size: 1.2 GB
#
# Top fragmented functions:
# 1. calculate_feature
# Versions: 8
# Size: 450 MB
If fragmentation is high, consider opting out for those functions.
Performance Implications
Overhead of Auto-Versioning
Minimal overhead - hash computed once at decorator application:
# Old smart_cacheable: 0.5ms PER CALL
@smart_cacheable # Read source + hash on EVERY call
def fast_func(x):
return x + 1
# New auto_versioning: 0ms per call
@cacheable() # Hash computed ONCE at import time
def fast_func(x):
return x + 1
Storage Implications
With auto-versioning, multiple versions can coexist temporarily:
cache/
my_module/
my_function/
v_abc123_args_xyz/ # Version 1 (orphaned)
v_def456_args_xyz/ # Version 2 (current)
v_ghi789_args_xyz/ # Version 3 (current)
Mitigation: Run cache_maintenance() periodically to clean orphans.
Testing Your Migration
1. Check for smart_cacheable usage
# This should find zero results after migration
grep -r "smart_cacheable" your_project/
2. Test auto-versioning behavior
from afml.cache import cacheable
@cacheable()
def test_func(x):
return x * 2
# First call
result1 = test_func(5) # Cache miss
# Second call (should hit)
result2 = test_func(5) # Cache hit
# Change function
def test_func(x):
return x * 3 # Changed!
# Third call (should miss due to version change)
result3 = test_func(5) # Cache miss (automatic!)
assert result3 == 15 # New result
3. Verify cleanup works
from afml.cache import find_orphaned_caches
orphans = find_orphaned_caches()
print(f"Found {orphans['orphaned_count']} orphaned caches")
print(f"Total size: {orphans['total_size_mb']} MB")
Troubleshooting
Issue: Cache not invalidating on changes
Cause: Function source unavailable (built-in/dynamic)
Solution: Check logs for warnings:
# Look for:
# "Cannot hash source for my_func, using file mtime for versioning"
If file mtime also fails, explicitly use auto_versioning=False and manage manually.
Issue: Too many cache versions
Cause: Rapid development with many changes
Solution: Run cleanup more frequently:
from afml.cache import cache_maintenance
cache_maintenance(
clean_orphaned=True,
min_orphan_age_hours=12 # More aggressive
)
Issue: Expensive function cache lost
Cause: Auto-versioning invalidated cache on minor change
Solution: Opt-out for that specific function:
@cacheable(auto_versioning=False)
def expensive_stable_function(data):
return days_of_computation(data)
Backward Compatibility
Old Decorator Aliases
These still work (no changes needed):
from afml.cache import (
robust_cacheable, # = cacheable()
time_aware_cacheable, # = cacheable(time_aware=True)
cv_cacheable, # = cacheable()
)
# All now have auto_versioning=True by default
Disabling Auto-Versioning Globally
If you want old behavior everywhere (not recommended):
# In your __init__.py or main module
from afml.cache import disable_auto_versioning
# Use this instead of cacheable
cacheable = disable_auto_versioning()
# Now all @cacheable() calls have auto_versioning=False
Getting Help
Check Cache Health
from afml.cache import print_cache_report
print_cache_report()
Debug Specific Function
from afml.cache import debug_function_cache
debug_function_cache("afml.features.my_func")
Analyze Version Fragmentation
from afml.cache import analyze_cache_versions, print_version_analysis
analysis = analyze_cache_versions()
print_version_analysis()
Summary
✅ What You Need to Do:
- Replace
@smart_cacheablewith@cacheable()(required) - Review expensive functions and opt-out if needed (optional)
- Set up periodic cache maintenance (recommended)
✅ What's Better Now:
- Automatic cache invalidation on code changes (correctness)
- No per-call overhead (performance)
- Complete invalidation for all args (reliability)
- Simpler mental model (clarity)
✅ Default is Correct:
auto_versioning=Trueprevents stale cache bugs- Only opt-out for specific expensive stable functions
- When in doubt, use the default