requests beautifulsoup4 pdfminer.six PyMuPDF pdf2image pytesseract unstructured gradio faiss-cpu langchain tiktoken poppler-utils openai