NLProxy / nlproxy /docs /cache.md
Luiserb's picture
first commit
2129c29
|
Raw
History Blame Contribute Delete
2.79 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

NLProxy Cache Module Reference

This module documents cache/semantic_cache.py.

Purpose

SemanticLLMCache provides a Redis-backed semantic cache for LLM prompt-response pairs. It stores response metadata and embedding vectors, enabling retrieval of semantically similar prior prompts rather than regenerating responses.

Key Class

SemanticLLMCache

Responsibilities

  • Normalize and store embedding vectors in RedisVL.
  • Search cached vectors based on cosine similarity.
  • Enforce TTL-based expiration and domain isolation.
  • Maintain hit/miss statistics.

Constructor

SemanticLLMCache(
    redis_url: str = "redis://localhost:6379",
    similarity_threshold: float = 0.92,
    default_ttl: int = 3600,
    dimension: int = 384,
    index_name: str = "prompt_cache",
    prefix: str = "cache:",
    max_connections: int = 50,
    socket_timeout: float = 5.0,
)

Important Methods

  • _normalize(embedding: np.ndarray) -> List[float]

    • Converts raw embeddings into L2-normalized Python lists.
    • Complexity: O(d).
  • store(query_embedding, response_text, metadata, domain)

    • Stores a cached entry in a RedisVL vector index.
    • Writes both vector and metadata fields.
  • search(query_embedding, domain=None) -> Optional[Dict[str, Any]]

    • Performs vector similarity search with threshold filtering.
    • Complexity: O(N · d) for flat scan; uses RedisVL index heuristics.
  • clear(domain: Optional[str] = None)

    • Deletes cached entries globally or within a domain.
  • get_stats() -> Dict[str, int]

    • Returns hit/miss counters.

Dependencies

  • redis / redis-py
  • redisvl for vector search index management
  • numpy

Performance Characteristics

  • Embedding normalization is linear in embedding dimension.
  • Search cost scales with number of indexed entries and vector size.
  • RedisVL reduces query latency compared to raw key scans, but the module remains CPU-bound for large indexes.

Scalability Considerations

  • Default Redis connection pool size is 50. This is configurable via max_connections.
  • socket_timeout ensures network faults fail fast.
  • For high-volume deployments, Redis clustering or an approximate nearest neighbor store is recommended.

Operational Guidelines

  • Ensure dimension matches the embedding model output size.
  • Configure similarity_threshold carefully; values near 1.0 reduce false positives but also lower hit rate.
  • Monitor hit/miss ratios and eviction trends.

Edge Cases

  • The cache treats a missing Redis connection as a hard failure during initialization.
  • A vector index with unmatched schema or incompatible dimension will fail to create.
  • Entries with stale TTL values are not automatically instant-removed until read or cleanup operations occur.