Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / nlproxy /docs /cache.md

Luiserb

first commit

2129c29 11 days ago

preview code

Raw

History Blame Contribute Delete

2.79 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

NLProxy Cache Module Reference

This module documents cache/semantic_cache.py.

Purpose

SemanticLLMCache provides a Redis-backed semantic cache for LLM prompt-response pairs. It stores response metadata and embedding vectors, enabling retrieval of semantically similar prior prompts rather than regenerating responses.

Key Class

`SemanticLLMCache`

Responsibilities

Normalize and store embedding vectors in RedisVL.
Search cached vectors based on cosine similarity.
Enforce TTL-based expiration and domain isolation.
Maintain hit/miss statistics.

Constructor

SemanticLLMCache(
    redis_url: str = "redis://localhost:6379",
    similarity_threshold: float = 0.92,
    default_ttl: int = 3600,
    dimension: int = 384,
    index_name: str = "prompt_cache",
    prefix: str = "cache:",
    max_connections: int = 50,
    socket_timeout: float = 5.0,
)

Important Methods

_normalize(embedding: np.ndarray) -> List[float]
- Converts raw embeddings into L2-normalized Python lists.
- Complexity: O(d).
store(query_embedding, response_text, metadata, domain)
- Stores a cached entry in a RedisVL vector index.
- Writes both vector and metadata fields.
search(query_embedding, domain=None) -> Optional[Dict[str, Any]]
- Performs vector similarity search with threshold filtering.
- Complexity: O(N · d) for flat scan; uses RedisVL index heuristics.
clear(domain: Optional[str] = None)
- Deletes cached entries globally or within a domain.
get_stats() -> Dict[str, int]
- Returns hit/miss counters.

Dependencies

redis / redis-py
redisvl for vector search index management
numpy

Performance Characteristics

Embedding normalization is linear in embedding dimension.
Search cost scales with number of indexed entries and vector size.
RedisVL reduces query latency compared to raw key scans, but the module remains CPU-bound for large indexes.

Scalability Considerations

Default Redis connection pool size is 50. This is configurable via max_connections.
socket_timeout ensures network faults fail fast.
For high-volume deployments, Redis clustering or an approximate nearest neighbor store is recommended.

Operational Guidelines

Ensure dimension matches the embedding model output size.
Configure similarity_threshold carefully; values near 1.0 reduce false positives but also lower hit rate.
Monitor hit/miss ratios and eviction trends.

Edge Cases

The cache treats a missing Redis connection as a hard failure during initialization.
A vector index with unmatched schema or incompatible dimension will fail to create.
Entries with stale TTL values are not automatically instant-removed until read or cleanup operations occur.