Model Description

arXivBERT is a series of models trained on a time-based unit. If you are looking for the best performance on scientific corpora, please use the model from 2020 directly.

Why ?arXivBERT

  1. Specialized in Scientific Content: Trained on a large dataset of arXiv papers, ensuring high familiarity with scientific terminology and concepts.
  2. Versatile in Applications: Suitable for a range of NLP tasks, including but not limited to text classification, keyword extraction, summarization of scientific papers, and citation prediction.
  3. Evolutionary Insights: Continuous pre-training captures the long-term relationships and changes within the corpus.

How to Use?

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("folderPath/year")
model = AutoModel.from_pretrained("folderPath/wholewordtokenizer")

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .