--- license: apache-2.0 datasets: - AyoubChLin/CNN_News_Articles_2011-2022 language: - en tags: - topic modeling - BERT - CNN news articles --- # BERTopic Model for CNN News Articles This model is a BERTopic model fine-tuned on CNN news articles. It uses the sentence transformer model "all-MiniLM-L6-v2" to encode the sentences and UMAP for dimensionality reduction. ## Usage First, install the required packages: ```console pip install sentence_transformers umap-learn bertopic ``` ``` python Then, load the model and encode your documents: ```python from sentence_transformers import SentenceTransformer from umap import UMAP from bertopic import BERTopic # Load the sentence transformer model sentence_model = SentenceTransformer("all-MiniLM-L6-v2") # Set the random state in the UMAP model to prevent stochastic behavior umap_model = UMAP(n_neighbors=15, n_components=5, min_dist=0.0, metric='cosine', random_state=42) # Load the BERTopic model my_model = BERTopic.load("from/path/model.bin") # Encode your documents document_embeddings = sentence_model.encode(documents) ``` # predict : ```python sentences = "my sentence" embeddings = sentence_model.encode([sentences]) topic , _ =my_model.transform([sentences],embeddings) ``` For more information on how to use the BERTopic model, see the (BERTopic documentation)[https://maartengr.github.io/BERTopic/index.html].