File size: 1,396 Bytes
372ec5d
 
a20a891
 
 
 
 
 
 
 
372ec5d
a20a891
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
license: apache-2.0
datasets:
- AyoubChLin/CNN_News_Articles_2011-2022
language:
- en
tags:
- topic modeling
- BERT
- CNN news articles
---
# BERTopic Model for CNN News Articles

This model is a BERTopic model fine-tuned on CNN news articles. It uses the sentence transformer model "all-MiniLM-L6-v2" to encode the sentences and UMAP for dimensionality reduction.

## Usage

First, install the required packages:

```console
pip install sentence_transformers umap-learn bertopic
```

``` python

Then, load the model and encode your documents:

```python
from sentence_transformers import SentenceTransformer
from umap import UMAP
from bertopic import BERTopic

# Load the sentence transformer model
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")

# Set the random state in the UMAP model to prevent stochastic behavior 
umap_model = UMAP(n_neighbors=15, n_components=5,  min_dist=0.0, metric='cosine', random_state=42)

# Load the BERTopic model
my_model = BERTopic.load("from/path/model.bin")

# Encode your documents
document_embeddings = sentence_model.encode(documents)
```


# predict :


```python

sentences = "my sentence"

embeddings = sentence_model.encode([sentences])

topic , _ =my_model.transform([sentences],embeddings)

```


For more information on how to use the BERTopic model, see the (BERTopic documentation)[https://maartengr.github.io/BERTopic/index.html].