Working of Search Web

#339
by Saugatkafley - opened

Hello , I wanted to learn about the Search web feature?

  • does this relate with some advanced form of RAG?
  • How is the most recent information is captured?
  • Which sites does it looks upon?

image.png

Hugging Chat org

Hi! You can look at the entrypoint for the websearch on github here. To summarize:

  1. We generate a google query based on the conversation history
  2. That query is passed to a search results provider (we support 4 options, but in production we use serper.dev)
  3. We fetch the content of the top webpages returned by our search result provider
  4. We extract the text from those webpages, chunk it up and use transformers.js to generate embeddings for them
  5. We fetch the most relevant chunks using similarity search
  6. We inject the most relevant chunks in the conversation context before generating the answer.

Hope that answers your question, again all the code is available on github so feel free to have a look there for yourself if you're missing something.

Sign up or log in to comment