bertopic_cnn_news / README.md
AyoubChLin's picture
Update README.md
a20a891
---
license: apache-2.0
datasets:
- AyoubChLin/CNN_News_Articles_2011-2022
language:
- en
tags:
- topic modeling
- BERT
- CNN news articles
---
# BERTopic Model for CNN News Articles
This model is a BERTopic model fine-tuned on CNN news articles. It uses the sentence transformer model "all-MiniLM-L6-v2" to encode the sentences and UMAP for dimensionality reduction.
## Usage
First, install the required packages:
```console
pip install sentence_transformers umap-learn bertopic
```
``` python
Then, load the model and encode your documents:
```python
from sentence_transformers import SentenceTransformer
from umap import UMAP
from bertopic import BERTopic
# Load the sentence transformer model
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
# Set the random state in the UMAP model to prevent stochastic behavior
umap_model = UMAP(n_neighbors=15, n_components=5, min_dist=0.0, metric='cosine', random_state=42)
# Load the BERTopic model
my_model = BERTopic.load("from/path/model.bin")
# Encode your documents
document_embeddings = sentence_model.encode(documents)
```
# predict :
```python
sentences = "my sentence"
embeddings = sentence_model.encode([sentences])
topic , _ =my_model.transform([sentences],embeddings)
```
For more information on how to use the BERTopic model, see the (BERTopic documentation)[https://maartengr.github.io/BERTopic/index.html].