torch torchvision transformers fake_http_header beautifulsoup4 h5py scipy scikit-learn gensim tqdm