PyPDF2 pdf2image pytesseract streamlit langchain langchain-community langchain-core deeplake assemblyai sentence-transformers youtube-transcript-api langchain-google-genai