LightDiffusion-Next / docs /prompt-caching.md
Aatricks's picture
Deploy ZeroGPU Gradio Space snapshot
b701455

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Prompt Attention Caching

What It Does

Caches CLIP text embeddings for prompts you've already encoded. When you reuse a prompt (or parts of it), the embedding is retrieved from cache instead of being recomputed.

When It Helps Most

  • Batch generation with same prompt
  • Testing different seeds
  • Incremental prompt refinement
  • Generation sessions with repeated themes

Configuration

Enable/Disable (default: enabled):

from src.Utilities import prompt_cache

# Enable (default)
prompt_cache.enable_prompt_cache(True)

# Disable
prompt_cache.enable_prompt_cache(False)

# Check status
stats = prompt_cache.get_cache_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")

Cache Settings:

  • Maximum entries: 256 prompts before pruning
  • Cache structure: global dict keyed by prompt hash and CLIP identity
  • Memory usage: workload-dependent, estimated from cached embedding tensors
  • Cache cleared on: restart, disable, or manual clear
  • Automatic pruning: removes the oldest 25% of entries when the cache exceeds its limit

Viewing Cache Stats

from src.Utilities import prompt_cache

# Print statistics
prompt_cache.print_cache_stats()

# Output:
# ============================================================
# Prompt Cache Statistics
# ============================================================
#   Status: Enabled
#   Entries: 42
#   Size: ~85.3 MB
#   Requests: 150 (hits: 108, misses: 42)
#   Hit Rate: 72.0%
# ============================================================

Best Practices

  1. Leave it enabled - negligible overhead, significant gains
  2. Monitor hit rate - should be >50% in typical workflows
  3. Clear cache when switching models or major prompt changes
  4. Batch similar prompts to maximize cache hits
  5. Expect global behavior because the cache is shared across repeated prompt encodes rather than being scoped to a single generation session