Edit model card

short-arxiv-bertopic

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("etanios/short-arxiv-bertopic")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 38
  • Number of training documents: 9999
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 data - learning - model - based - algorithm 50 Machine Learning and Data Analysis
0 deep - networks - neural - training - network 2896 Advances in Deep Neural Networks for Computer Vision
1 neural - word - model - language - models 1036 Neural Language Models
2 regret - bandit - online - algorithm - problem 755 Optimization of regret in multi-armed bandit problems
3 policy - reinforcement - reinforcement learning - learning - control 552 Reinforcement Learning Policies
4 clustering - clusters - data - means - cluster 504 Clustering algorithms and techniques
5 classification - classifiers - class - classifier - ensemble 463 Machine Learning Ensembles
6 gradient - stochastic - convex - convergence - optimization 293 Optimization techniques for non-convex problems
7 learning - epsilon - distribution - complexity - bounds 271 Machine Learning and Complexity
8 matrix - rank - low rank - low - completion 257 Matrix Completion and Robust Matrix Completion
9 sparse - dictionary - signal - signals - sensing 218 Sparse Coding and Dictionary Learning
10 kernel - kernels - learning - kernel learning - mkl 185 Kernel Learning and Multiple Kernels
11 topic - topics - lda - model - topic models 173 Topic Modeling and LDA
12 bayesian - structure - bayesian networks - bayesian network - network 163 Structure learning of Bayesian networks
13 users - user - recommendation - items - collaborative 162 Recommendation Systems: Collaborative Filtering
14 inference - posterior - variational - mcmc - carlo 157 Bayesian Inference Techniques
15 feature - selection - feature selection - data - classification 145 Data Preparation for Cancer Classification
16 active - active learning - learning - optimization - bayesian optimization 144 Active Learning and Optimization
17 lasso - sparse - group - sparsity - regression 137 High-dimensional regression with sparsity
18 distributed - ml - communication - machine - data 126 Distributed Machine Learning
19 privacy - private - differential privacy - differential - differentially 117 Privacy and Data Mining
20 anomaly - detection - anomaly detection - data - anomalies 99 Anomaly Detection in Data Sets
21 ranking - rank - items - pairwise - comparisons 92 Ranking and Preference Learning
22 metric - metric learning - distance - learning - similarity 87 Metric Learning
23 svm - support - support vector - svms - vector 79 Efficient and Fast SVM Algorithms
24 hashing - hash - binary - codes - bit 76 Large-scale search and indexing using hashing methods
25 graph - graphs - nodes - relational - kernels 75 Graph-based Semi-supervised Learning
26 manifold - dimensional - data - manifold learning - embedding 74 Manifold Learning and Dimensionality Reduction
27 tensor - decomposition - tensors - rank - tensor decomposition 74 Tensor Decomposition and Rank
28 bethe - belief propagation - belief - propagation - bp 73 Inference in Graphical Models
29 image - semantic - images - visual - shot 65 Zero-shot learning for image recognition
30 gp - gaussian - gaussian process - process - covariance 65 Gaussian Processes for Large Data
31 domain - adaptation - domain adaptation - target - source 64 Domain Adaptation
32 crowdsourcing - workers - labels - crowd - worker 56 Crowdsourced Labeling and Task Assignment
33 causal - variables - data - discovery - cause 55 Causal Discovery and Inference
34 label - multi label - multi - labels - multi label classification 55 Multi-Label Classification
35 protein - proteins - prediction - structure - amino 54 Protein structure prediction and sequence analysis
36 nmf - nonnegative - matrix - nonnegative matrix - factorization 52 Nonnegative Matrix Factorization (NMF)

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.4
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.33.2
  • Numba: 0.56.4
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.