Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.19.0
NLProxy Cache Module Reference
This module documents cache/semantic_cache.py.
Purpose
SemanticLLMCache provides a Redis-backed semantic cache for LLM prompt-response pairs. It stores response metadata and embedding vectors, enabling retrieval of semantically similar prior prompts rather than regenerating responses.
Key Class
SemanticLLMCache
Responsibilities
- Normalize and store embedding vectors in RedisVL.
- Search cached vectors based on cosine similarity.
- Enforce TTL-based expiration and domain isolation.
- Maintain hit/miss statistics.
Constructor
SemanticLLMCache(
redis_url: str = "redis://localhost:6379",
similarity_threshold: float = 0.92,
default_ttl: int = 3600,
dimension: int = 384,
index_name: str = "prompt_cache",
prefix: str = "cache:",
max_connections: int = 50,
socket_timeout: float = 5.0,
)
Important Methods
_normalize(embedding: np.ndarray) -> List[float]- Converts raw embeddings into L2-normalized Python lists.
- Complexity: O(d).
store(query_embedding, response_text, metadata, domain)- Stores a cached entry in a RedisVL vector index.
- Writes both vector and metadata fields.
search(query_embedding, domain=None) -> Optional[Dict[str, Any]]- Performs vector similarity search with threshold filtering.
- Complexity: O(N · d) for flat scan; uses RedisVL index heuristics.
clear(domain: Optional[str] = None)- Deletes cached entries globally or within a domain.
get_stats() -> Dict[str, int]- Returns hit/miss counters.
Dependencies
redis/redis-pyredisvlfor vector search index managementnumpy
Performance Characteristics
- Embedding normalization is linear in embedding dimension.
- Search cost scales with number of indexed entries and vector size.
- RedisVL reduces query latency compared to raw key scans, but the module remains CPU-bound for large indexes.
Scalability Considerations
- Default Redis connection pool size is 50. This is configurable via
max_connections. socket_timeoutensures network faults fail fast.- For high-volume deployments, Redis clustering or an approximate nearest neighbor store is recommended.
Operational Guidelines
- Ensure
dimensionmatches the embedding model output size. - Configure
similarity_thresholdcarefully; values near1.0reduce false positives but also lower hit rate. - Monitor hit/miss ratios and eviction trends.
Edge Cases
- The cache treats a missing Redis connection as a hard failure during initialization.
- A vector index with unmatched schema or incompatible dimension will fail to create.
- Entries with stale TTL values are not automatically instant-removed until read or cleanup operations occur.