bertopic_cnn_news / README.md
AyoubChLin's picture
Update README.md
a20a891
metadata
license: apache-2.0
datasets:
  - AyoubChLin/CNN_News_Articles_2011-2022
language:
  - en
tags:
  - topic modeling
  - BERT
  - CNN news articles

BERTopic Model for CNN News Articles

This model is a BERTopic model fine-tuned on CNN news articles. It uses the sentence transformer model "all-MiniLM-L6-v2" to encode the sentences and UMAP for dimensionality reduction.

Usage

First, install the required packages:

pip install sentence_transformers umap-learn bertopic

Then, load the model and encode your documents:

```python
from sentence_transformers import SentenceTransformer
from umap import UMAP
from bertopic import BERTopic

# Load the sentence transformer model
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")

# Set the random state in the UMAP model to prevent stochastic behavior 
umap_model = UMAP(n_neighbors=15, n_components=5,  min_dist=0.0, metric='cosine', random_state=42)

# Load the BERTopic model
my_model = BERTopic.load("from/path/model.bin")

# Encode your documents
document_embeddings = sentence_model.encode(documents)

predict :


sentences = "my sentence"

embeddings = sentence_model.encode([sentences])

topic , _ =my_model.transform([sentences],embeddings)

For more information on how to use the BERTopic model, see the (BERTopic documentation)[https://maartengr.github.io/BERTopic/index.html].