Spaces:
Sleeping
Sleeping
# PineconeMemory Documentation | |
The `PineconeMemory` class provides a robust interface for integrating Pinecone-based Retrieval-Augmented Generation (RAG) systems. It allows for adding documents to a Pinecone index and querying the index for similar documents. The class supports custom embedding models, preprocessing functions, and other customizations to suit different use cases. | |
#### Parameters | |
| Parameter | Type | Default | Description | | |
|----------------------|-----------------------------------------------|-----------------------------------|------------------------------------------------------------------------------------------------------| | |
| `api_key` | `str` | - | Pinecone API key. | | |
| `environment` | `str` | - | Pinecone environment. | | |
| `index_name` | `str` | - | Name of the Pinecone index to use. | | |
| `dimension` | `int` | `768` | Dimension of the document embeddings. | | |
| `embedding_model` | `Optional[Any]` | `None` | Custom embedding model. Defaults to `SentenceTransformer('all-MiniLM-L6-v2')`. | | |
| `embedding_function` | `Optional[Callable[[str], List[float]]]` | `None` | Custom embedding function. Defaults to `_default_embedding_function`. | | |
| `preprocess_function`| `Optional[Callable[[str], str]]` | `None` | Custom preprocessing function. Defaults to `_default_preprocess_function`. | | |
| `postprocess_function`| `Optional[Callable[[List[Dict[str, Any]]], List[Dict[str, Any]]]]`| `None` | Custom postprocessing function. Defaults to `_default_postprocess_function`. | | |
| `metric` | `str` | `'cosine'` | Distance metric for Pinecone index. | | |
| `pod_type` | `str` | `'p1'` | Pinecone pod type. | | |
| `namespace` | `str` | `''` | Pinecone namespace. | | |
| `logger_config` | `Optional[Dict[str, Any]]` | `None` | Configuration for the logger. Defaults to logging to `rag_wrapper.log` and console output. | | |
### Methods | |
#### `_setup_logger` | |
```python | |
def _setup_logger(self, config: Optional[Dict[str, Any]] = None) | |
``` | |
Sets up the logger with the given configuration. | |
#### `_default_embedding_function` | |
```python | |
def _default_embedding_function(self, text: str) -> List[float] | |
``` | |
Generates embeddings using the default SentenceTransformer model. | |
#### `_default_preprocess_function` | |
```python | |
def _default_preprocess_function(self, text: str) -> str | |
``` | |
Preprocesses the input text by stripping whitespace. | |
#### `_default_postprocess_function` | |
```python | |
def _default_postprocess_function(self, results: List[Dict[str, Any]]) -> List[Dict[str, Any]] | |
``` | |
Postprocesses the query results. | |
#### `add` | |
Adds a document to the Pinecone index. | |
| Parameter | Type | Default | Description | | |
|-----------|-----------------------|---------|-----------------------------------------------| | |
| `doc` | `str` | - | The document to be added. | | |
| `metadata`| `Optional[Dict[str, Any]]` | `None` | Additional metadata for the document. | | |
#### `query` | |
Queries the Pinecone index for similar documents. | |
| Parameter | Type | Default | Description | | |
|-----------|-------------------------|---------|-----------------------------------------------| | |
| `query` | `str` | - | The query string. | | |
| `top_k` | `int` | `5` | The number of top results to return. | | |
| `filter` | `Optional[Dict[str, Any]]` | `None` | Metadata filter for the query. | | |
## Usage | |
The `PineconeMemory` class is initialized with the necessary parameters to configure Pinecone and the embedding model. It supports a variety of custom configurations to suit different needs. | |
#### Example | |
```python | |
from swarms_memory import PineconeMemory | |
# Initialize PineconeMemory | |
memory = PineconeMemory( | |
api_key="your-api-key", | |
environment="us-west1-gcp", | |
index_name="example-index", | |
dimension=768 | |
) | |
``` | |
### Adding Documents | |
Documents can be added to the Pinecone index using the `add` method. The method accepts a document string and optional metadata. | |
#### Example | |
```python | |
doc = "This is a sample document to be added to the Pinecone index." | |
metadata = {"author": "John Doe", "date": "2024-07-08"} | |
memory.add(doc, metadata) | |
``` | |
### Querying Documents | |
The `query` method allows for querying the Pinecone index for similar documents based on a query string. It returns the top `k` most similar documents. | |
#### Example | |
```python | |
query = "Sample query to find similar documents." | |
results = memory.query(query, top_k=5) | |
for result in results: | |
print(result) | |
``` | |
## Additional Information and Tips | |
### Custom Embedding and Preprocessing Functions | |
Custom embedding and preprocessing functions can be provided during initialization to tailor the document processing to specific requirements. | |
#### Example | |
```python | |
def custom_embedding_function(text: str) -> List[float]: | |
# Custom embedding logic | |
return [0.1, 0.2, 0.3] | |
def custom_preprocess_function(text: str) -> str: | |
# Custom preprocessing logic | |
return text.lower() | |
memory = PineconeMemory( | |
api_key="your-api-key", | |
environment="us-west1-gcp", | |
index_name="example-index", | |
embedding_function=custom_embedding_function, | |
preprocess_function=custom_preprocess_function | |
) | |
``` | |
### Logger Configuration | |
The logger can be configured to suit different logging needs. The default configuration logs to a file and the console. | |
#### Example | |
```python | |
logger_config = { | |
"handlers": [ | |
{"sink": "custom_log.log", "rotation": "1 MB"}, | |
{"sink": lambda msg: print(msg, end="")}, | |
] | |
} | |
memory = PineconeMemory( | |
api_key="your-api-key", | |
environment="us-west1-gcp", | |
index_name="example-index", | |
logger_config=logger_config | |
) | |
``` | |
## References and Resources | |
- [Pinecone Documentation](https://docs.pinecone.io/) | |
- [SentenceTransformers Documentation](https://www.sbert.net/) | |
- [Loguru Documentation](https://loguru.readthedocs.io/en/stable/) | |
For further exploration and examples, refer to the official documentation and resources provided by Pinecone, SentenceTransformers, and Loguru. | |
This concludes the detailed documentation for the `PineconeMemory` class. The class offers a flexible and powerful interface for leveraging Pinecone's capabilities in retrieval-augmented generation systems. By supporting custom embeddings, preprocessing, and postprocessing functions, it can be tailored to a wide range of applications. |