asoria's picture
asoria HF staff
Add BERTopic model
0b7364f verified
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

transformers_issues_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("asoria/transformers_issues_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 30
  • Number of training documents: 9000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 pytorch - tensorflow - bert - tf - pretrained 15 -1_pytorch_tensorflow_bert_tf
0 bert - bertforsequenceclassification - berttokenizer - bart - batchencodeplus 2321 0_bert_bertforsequenceclassification_berttokenizer_bart
1 cuda - memory - trainertrain - tensorflow - trainer 1554 1_cuda_memory_trainertrain_tensorflow
2 transformerscli - transformers - transformer - importerror - transformerxl 882 2_transformerscli_transformers_transformer_importerror
3 modelcard - modelcards - card - model - models 490 3_modelcard_modelcards_card_model
4 gpt2 - gpt2tokenizer - gpt2xl - gpt2tokenizerfast - gpt2model 462 4_gpt2_gpt2tokenizer_gpt2xl_gpt2tokenizerfast
5 attributeerror - typeerror - valueerror - runtimeerror - indexerror 437 5_attributeerror_typeerror_valueerror_runtimeerror
6 typos - typo - doc - docstring - fix 336 6_typos_typo_doc_docstring
7 t5 - t5model - t5base - tf - t5large 298 7_t5_t5model_t5base_tf
8 readmemd - readmetxt - readme - modelcard - file 270 8_readmemd_readmetxt_readme_modelcard
9 ci - testing - tests - test - speedup 254 9_ci_testing_tests_test
10 s2s - s2sdistill - s2t - s2strainer - exampless2s 245 10_s2s_s2sdistill_s2t_s2strainer
11 glue - gluepy - glueconvertexamplestofeatures - roberta - huggingfacetransformers 214 11_glue_gluepy_glueconvertexamplestofeatures_roberta
12 ner - pipeline - pipelines - nerpipeline - fillmaskpipeline 158 12_ner_pipeline_pipelines_nerpipeline
13 rag - ragtokenforgeneration - ragsequenceforgeneration - clean - tests 153 13_rag_ragtokenforgeneration_ragsequenceforgeneration_clean
14 questionansweringpipeline - questionanswering - answering - tfalbertforquestionanswering - questionasnwering 143 14_questionansweringpipeline_questionanswering_answering_tfalbertforquestionanswering
15 onnx - 04onnxexport - 04onnxexportipynb - aionnx - sphynx 131 15_onnx_04onnxexport_04onnxexportipynb_aionnx
16 longformer - longformers - longform - longformerlayer - longformermodel 104 16_longformer_longformers_longform_longformerlayer
17 labelsmoothednllloss - label - labelsmoothingfactor - labels - labelsmoothing 76 17_labelsmoothednllloss_label_labelsmoothingfactor_labels
18 benchmark - benchmarking - benchmarks - accuracy - evaluation 73 18_benchmark_benchmarking_benchmarks_accuracy
19 wav2vec2 - wav2vec - wav2vec20 - wav2vec2forctc - wav2vec2xlrswav2vec2 67 19_wav2vec2_wav2vec_wav2vec20_wav2vec2forctc
20 flax - flaxelectraformaskedlm - flaxelectraforpretraining - flaxjax - flaxelectramodel 51 20_flax_flaxelectraformaskedlm_flaxelectraforpretraining_flaxjax
21 configpath - configs - config - configuration - modelconfigs 49 21_configpath_configs_config_configuration
22 logging - logs - log - logger - loghistory 40 22_logging_logs_log_logger
23 cachedir - cache - cachedpath - caching - cached 38 23_cachedir_cache_cachedpath_caching
24 wandbproject - wandb - sagemaker - sagemakertrainer - wandbcallback 36 24_wandbproject_wandb_sagemaker_sagemakertrainer
25 notebook - notebooks - community - colab - t5 33 25_notebook_notebooks_community_colab
26 electra - electrapretrainedmodel - electraformaskedlm - electraformultiplechoice - electrafortokenclassification 30 26_electra_electrapretrainedmodel_electraformaskedlm_electraformultiplechoice
27 layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf 25 27_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased
28 pplm - pr - deprecated - variable - ppl 15 28_pplm_pr_deprecated_variable

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 30
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.26.4
  • HDBSCAN: 0.8.38.post1
  • UMAP: 0.5.6
  • Pandas: 2.1.4
  • Scikit-Learn: 1.5.2
  • Sentence-transformers: 3.1.1
  • Transformers: 4.44.2
  • Numba: 0.60.0
  • Plotly: 5.24.1
  • Python: 3.10.12