Transformers SetFit pypdf2 openpyxl pdf2image poppler-utils pytesseract