Is there a way to ingest my knowledge base and perform rag using the hugging face serverless or dedicated endpoint?
#51
by
namantjeaswi
- opened
I have been able to use the quantized versions of llama 2 and the gguf format model to build rag applications, I wanted to test the full fledged non quantized llama 3 for rag applications and like most of us my personal computer does not have sufficient memory to run non quantized models. I wanted to know if there is way I can ingest my knowledge base, or pass my vector store to the api endpoint either serverless or dedicated and perform rag. My understanding is I could do so by running a space on better hardware and paying by the hour but I wanted to know if rag is possible with the api endpoint given by hugging face not just simple inference in a manner similar to open ai.