torch transformers datasets pytesseract opencv-python pypdfium2 pypdf langdetect gradio