langchain streamlit cassio datasets openai tiktoken pdfplumber PyPDF2