pyarrow pandas numpy arxiv sentence_transformers regex scikit-learn streamlit