numpy torch spacy scikit-learn transformers streamlit sentencepiece beautifulsoup4 nltk PyPDF2 docx2txt rouge altair==4.0