torch transformers datasets pytesseract opencv-python pdf2image pypdf langdetect gradio