Pinecone metadata question

#1
by joeyanuff - opened

I'm working on a clone of this HF Space and have a question about it, and setting metadata in Pinecone in general.

I'd like to attach similar domain metadata to my own Pinecone index, but I can't seem to find documentation explaining this procedure in the docs of Pinecone (or those of Langchain, whose DirectoryLoader util I'm currently using to help ingest my knowledge base.)

Is there a missing doc detailing best practices for writing metadata? Could it be done at the chunking stage? Can it be derived from filenames or internal info? Could it be set after indexing by way of selective querying and updating, or must the metadata payload be included at insertion? 

In the Metadata Filtering for Vector Search example, the SQuAD JSON is reformatted to include a metadata object, then the context string is transformed into a vector array, and only then is the index is created or upserted. 

Did you similarly pre-process all the vectors from the four ML libraries in this QA demo? (And at the same time set the metadata fields for docs, category, thread, and href?)

[Edit: Answered with info from a Stack Overflow answer from James:
https://stackoverflow.com/questions/71617889/how-to-use-metadata-for-document-retrieval-using-sentence-transformers]

joeyanuff changed discussion status to closed

Sign up or log in to comment