mark230271's picture
Add BERTopic model
b53c390 verified
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

transformers_issues_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("mark230271/transformers_issues_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 30
  • Number of training documents: 9000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 tokenizer - bert - tokenizers - pytorch - tensorflow 11 -1_tokenizer_bert_tokenizers_pytorch
0 tokenizer - tokenizers - tokenization - berttokenizer - bart 2376 0_tokenizer_tokenizers_tokenization_berttokenizer
1 cuda - gpt2 - gpt - gpus - gpu 1879 1_cuda_gpt2_gpt_gpus
2 modelcard - modelcards - card - model - models 735 2_modelcard_modelcards_card_model
3 transformerscli - transformers - transformer - transformerxl - importerror 412 3_transformerscli_transformers_transformer_transformerxl
4 typeerror - attributeerror - valueerror - error - errors 385 4_typeerror_attributeerror_valueerror_error
5 trainertrain - trainer - trainerevaluate - trainers - training 330 5_trainertrain_trainer_trainerevaluate_trainers
6 seq2seq - seq2seqtrainer - s2s - runseq2seq - seq2seqdataset 319 6_seq2seq_seq2seqtrainer_s2s_runseq2seq
7 typos - typo - fix - correction - fixed 306 7_typos_typo_fix_correction
8 ci - testing - test - tests - circleci 282 8_ci_testing_test_tests
9 readmemd - readmetxt - readme - file - camembertbasereadmemd 255 9_readmemd_readmetxt_readme_file
10 t5 - t5model - tf - t5base - t5large 255 10_t5_t5model_tf_t5base
11 generationbeamsearchpy - beamsearch - groupbeamsearch - beam - search 218 11_generationbeamsearchpy_beamsearch_groupbeamsearch_beam
12 flax - distilbertmodel - flaubert - deberta - model 185 12_flax_distilbertmodel_flaubert_deberta
13 ner - pipeline - pipelines - nerpipeline - fillmaskpipeline 177 13_ner_pipeline_pipelines_nerpipeline
14 questionansweringpipeline - tfalbertforquestionanswering - questionanswering - distilbertforquestionanswering - answering 161 14_questionansweringpipeline_tfalbertforquestionanswering_questionanswering_distilbertforquestionanswering
15 huggingfacetransformers - huggingface - hugging - gluepy - gluebenchmarkcom 133 15_huggingfacetransformers_huggingface_hugging_gluepy
16 onnx - onnxonnxruntime - onnxexport - 04onnxexport - 04onnxexportipynb 130 16_onnx_onnxonnxruntime_onnxexport_04onnxexport
17 labelsmoothednllloss - labelsmoothingfactor - label - labels - labelsmoothing 96 17_labelsmoothednllloss_labelsmoothingfactor_label_labels
18 longformer - longformers - longform - longformerlayer - longformermodel 73 18_longformer_longformers_longform_longformerlayer
19 configpath - configs - config - configuration - modelconfigs 59 19_configpath_configs_config_configuration
20 wandbproject - wandb - sagemaker - sagemakertrainer - wandbcallback 45 20_wandbproject_wandb_sagemaker_sagemakertrainer
21 cachedir - cache - cachedpath - caching - cached 33 21_cachedir_cache_cachedpath_caching
22 notebook - notebooks - community - colab - t5 33 22_notebook_notebooks_community_colab
23 electra - electrapretrainedmodel - electraformaskedlm - electraformultiplechoice - electrafortokenclassification 30 23_electra_electrapretrainedmodel_electraformaskedlm_electraformultiplechoice
24 layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf 24 24_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased
25 isort - blackisortflake8 - github - repo - version 18 25_isort_blackisortflake8_github_repo
26 pplm - pr - deprecated - variable - ppl 14 26_pplm_pr_deprecated_variable
27 indexerror - index - missingindex - indices - runtimeerror 14 27_indexerror_index_missingindex_indices
28 ga - fork - forks - forked - push 12 28_ga_fork_forks_forked

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 30
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.25.2
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.6
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.6.1
  • Transformers: 4.38.2
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12