Dear allen ai

I am trying to use scivocab as a pre-trained model for some topic modelling on scientific papers.
Unfortunately, I cannot download scivocab using SentenceTransformers, and the transformers.pipelines won't work either, since there is no specified pipeline type.

How do you suggest usage in python?

From here and there's a .sh to get you started. Something like...

from transformers import *
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased')
model = AutoModel.from_pretrained('allenai/scibert_scivocab_uncased')

Thanks. It also works if I skip the pipeline i.e.

from transformers import *
tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased') 
embed_model = AutoModel.from_pretrained('allenai/scibert_scivocab_uncased')

However, I am unsure if BERTopic is actually using it, or just defaulting to . When I run
topic_model = BERTopic(embedding_model=embed_model, language="english", nr_topics="auto", verbose=True )
topics, probs = topic_model.fit_transform(docs)

the verbose output is:

THis might of course be an issue in the Bertopic package.