File size: 1,558 Bytes
74ce942
 
d5f15cb
 
ea72d75
d5f15cb
 
b53e9b8
 
74ce942
b53e9b8
ea72d75
 
 
 
 
74ce942
ea72d75
 
 
 
 
 
 
 
 
 
 
 
b53e9b8
 
ea72d75
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import streamlit as st

from utilities import initialization

st.set_page_config(page_title="Top2Vec", layout="wide")
initialization()

vb_link = 'https://visitor-badge.glitch.me/badge?page_id=demo-org.Top2Vec&left_color=gray&right_color=blue'
visitor_badge = f"![Total Visitors]({vb_link})"
st.markdown(
        f"""
        # Introduction
        This is [space](https://huggingface.co/spaces) dedicated to using [top2vec](https://github.com/ddangelov/Top2Vec) and showing what features are available for semantic searching and topic modeling. 
        Please check out this [readme](https://github.com/ddangelov/Top2Vec#how-does-it-work) to better understand how it works.
        
        > Top2Vec is an algorithm for **topic modeling** and **semantic search**. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.
    
        
        # Setup
        I used the [20 NewsGroups](https://huggingface.co/datasets/SetFit/20_newsgroups) dataset with `top2vec`. 
        I fit on the dataset and reduced the topics to 20. 
        The topics are created from top2vec, not the labels. 
        No analysis on the top 20 topics vs labels is provided.
        
        # Usage
        Check out
        - The [Topic Explorer](/Topic_Explorer) page to understand what topic were detected 
        - The [Document Explorer](/Document_Explorer) page to visually explore documents
        - The [Semantic Search](/Semantic_Search) page to search by meaning

        {visitor_badge}
        """
        )