AyoubChLin commited on
Commit
a20a891
1 Parent(s): b788199

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -1,3 +1,61 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - AyoubChLin/CNN_News_Articles_2011-2022
5
+ language:
6
+ - en
7
+ tags:
8
+ - topic modeling
9
+ - BERT
10
+ - CNN news articles
11
  ---
12
+ # BERTopic Model for CNN News Articles
13
+
14
+ This model is a BERTopic model fine-tuned on CNN news articles. It uses the sentence transformer model "all-MiniLM-L6-v2" to encode the sentences and UMAP for dimensionality reduction.
15
+
16
+ ## Usage
17
+
18
+ First, install the required packages:
19
+
20
+ ```console
21
+ pip install sentence_transformers umap-learn bertopic
22
+ ```
23
+
24
+ ``` python
25
+
26
+ Then, load the model and encode your documents:
27
+
28
+ ```python
29
+ from sentence_transformers import SentenceTransformer
30
+ from umap import UMAP
31
+ from bertopic import BERTopic
32
+
33
+ # Load the sentence transformer model
34
+ sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
35
+
36
+ # Set the random state in the UMAP model to prevent stochastic behavior
37
+ umap_model = UMAP(n_neighbors=15, n_components=5, min_dist=0.0, metric='cosine', random_state=42)
38
+
39
+ # Load the BERTopic model
40
+ my_model = BERTopic.load("from/path/model.bin")
41
+
42
+ # Encode your documents
43
+ document_embeddings = sentence_model.encode(documents)
44
+ ```
45
+
46
+
47
+ # predict :
48
+
49
+
50
+ ```python
51
+
52
+ sentences = "my sentence"
53
+
54
+ embeddings = sentence_model.encode([sentences])
55
+
56
+ topic , _ =my_model.transform([sentences],embeddings)
57
+
58
+ ```
59
+
60
+
61
+ For more information on how to use the BERTopic model, see the (BERTopic documentation)[https://maartengr.github.io/BERTopic/index.html].