Spaces:

Aatricks
/

LightDiffusion-Next

Running on Zero

App Files Files Community

LightDiffusion-Next / docs /prompt-caching.md

Aatricks

Deploy ZeroGPU Gradio Space snapshot

b701455 20 days ago

preview code

raw

history blame contribute delete

1.94 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Prompt Attention Caching

What It Does

Caches CLIP text embeddings for prompts you've already encoded. When you reuse a prompt (or parts of it), the embedding is retrieved from cache instead of being recomputed.

When It Helps Most

Batch generation with same prompt
Testing different seeds
Incremental prompt refinement
Generation sessions with repeated themes

Configuration

Enable/Disable (default: enabled):

from src.Utilities import prompt_cache

# Enable (default)
prompt_cache.enable_prompt_cache(True)

# Disable
prompt_cache.enable_prompt_cache(False)

# Check status
stats = prompt_cache.get_cache_stats()
print(f"Hit rate: {stats['hit_rate']:.1%}")

Cache Settings:

Maximum entries: 256 prompts before pruning
Cache structure: global dict keyed by prompt hash and CLIP identity
Memory usage: workload-dependent, estimated from cached embedding tensors
Cache cleared on: restart, disable, or manual clear
Automatic pruning: removes the oldest 25% of entries when the cache exceeds its limit

Viewing Cache Stats

from src.Utilities import prompt_cache

# Print statistics
prompt_cache.print_cache_stats()

# Output:
# ============================================================
# Prompt Cache Statistics
# ============================================================
#   Status: Enabled
#   Entries: 42
#   Size: ~85.3 MB
#   Requests: 150 (hits: 108, misses: 42)
#   Hit Rate: 72.0%
# ============================================================

Best Practices

Leave it enabled - negligible overhead, significant gains
Monitor hit rate - should be >50% in typical workflows
Clear cache when switching models or major prompt changes
Batch similar prompts to maximize cache hits
Expect global behavior because the cache is shared across repeated prompt encodes rather than being scoped to a single generation session