Improve output quality

#3
by thomasmz1 - opened

Even though the output is quite good already, there are a few issues:

  1. Often, when we ask a general questions like "what is the file about?", it will state "found nothing". Presumably this is because of the prompting as well as semantic search not being able to return similar chunks, because of the generality.

  2. Sometime the chunks lack context between each other, so it might make sense to increase the chunk size during search and then generate shortened summaries for each chunk, which can then be added to the prompt

  3. For some pdfs, especially lecture slides, sometimes "\uf0a7 \uf0a7 \uf0a7 \uf02d \uf0a7 \uf02d \uf0a7 \uf0a7 \uf0a7 \uf0a7 \uf0a7 \uf0a7 \uf0a7" stop characters are not filtered out. This could either be done via Regex, or via the summaries mentioned above.

Sign up or log in to comment