ruanwz's picture
Add BERTopic model
90a0f99
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

transformers_issues_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("ruanwz/transformers_issues_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 30
  • Number of training documents: 9000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 tensorflow - pytorch - tf - pretrained - gpu 11 -1_tensorflow_pytorch_tf_pretrained
0 tokenizer - tokenizers - tokenize - tokenization - token 2089 0_tokenizer_tokenizers_tokenize_tokenization
1 gpt2 - gpt - gpt2doubleheadsmodel - gpt2lmheadmodel - distilgpt2 1471 1_gpt2_gpt_gpt2doubleheadsmodel_gpt2lmheadmodel
2 ner - seq2seqtrainer - seq2seq - runseq2seqpy - valueerror 856 2_ner_seq2seqtrainer_seq2seq_runseq2seqpy
3 modelcard - modelcards - card - model - cards 601 3_modelcard_modelcards_card_model
4 trainer - trainertrain - trainers - training - evaluateduringtraining 500 4_trainer_trainertrain_trainers_training
5 longformer - longformers - longformerformultiplechoice - tf - longformertokenizerfast 455 5_longformer_longformers_longformerformultiplechoice_tf
6 typos - typo - fix - correction - fixed 439 6_typos_typo_fix_correction
7 albertbasev2 - albertforpretraining - albert - albertformaskedlm - xlnet 407 7_albertbasev2_albertforpretraining_albert_albertformaskedlm
8 summarization - summaries - summary - text - nlp 351 8_summarization_summaries_summary_text
9 readmemd - readmetxt - readme - modelcard - file 333 9_readmemd_readmetxt_readme_modelcard
10 transformerscli - transformers - transformer - transformerxl - importerror 259 10_transformerscli_transformers_transformer_transformerxl
11 ci - testing - tests - test - slow 228 11_ci_testing_tests_test
12 questionansweringpipeline - questionanswering - answering - tfalbertforquestionanswering - questionasnwering 156 12_questionansweringpipeline_questionanswering_answering_tfalbertforquestionanswering
13 pipeline - pipelines - pipelinespy - pipelineexception - fixpipeline 137 13_pipeline_pipelines_pipelinespy_pipelineexception
14 onnxonnxruntime - onnx - onnxexport - 04onnxexport - 04onnxexportipynb 113 14_onnxonnxruntime_onnx_onnxexport_04onnxexport
15 benchmark - benchmarks - accuracy - evaluation - metrics 98 15_benchmark_benchmarks_accuracy_evaluation
16 huggingfacemaster - huggingfacetokenizers297 - huggingface - huggingfaces - huggingfacetransformers 81 16_huggingfacemaster_huggingfacetokenizers297_huggingface_huggingfaces
17 generationbeamsearchpy - generatebeamsearch - generatebeamsearchoutputs - beamsearch - nonbeamsearch 69 17_generationbeamsearchpy_generatebeamsearch_generatebeamsearchoutputs_beamsearch
18 wav2vec2 - wav2vec - wav2vec20 - wav2vec2forctc - wav2vec2xlrswav2vec2 56 18_wav2vec2_wav2vec_wav2vec20_wav2vec2forctc
19 flax - flaxelectraformaskedlm - flaxelectraforpretraining - flaxjax - flaxelectramodel 53 19_flax_flaxelectraformaskedlm_flaxelectraforpretraining_flaxjax
20 cachedir - cache - cachedpath - cached - caching 43 20_cachedir_cache_cachedpath_cached
21 notebook - notebooks - colab - community - t5 33 21_notebook_notebooks_colab_community
22 wandbproject - wandb - sagemaker - sagemakertrainer - wandbcallback 32 22_wandbproject_wandb_sagemaker_sagemakertrainer
23 bigbird - py7zr - tapas - tres - v4 32 23_bigbird_py7zr_tapas_tres
24 electra - electrapretrainedmodel - electraformaskedlm - electraformultiplechoice - electrafortokenclassification 28 24_electra_electrapretrainedmodel_electraformaskedlm_electraformultiplechoice
25 layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf 24 25_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased
26 isort - blackisortflake8 - github - repo - version 18 26_isort_blackisortflake8_github_repo
27 pplm - pr - deprecated - variable - ppl 14 27_pplm_pr_deprecated_variable
28 blenderbot - blenderbot3b - blenderbotforcausallm - chatbot - boto3 13 28_blenderbot_blenderbot3b_blenderbotforcausallm_chatbot

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 30
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.15.0
  • Python: 3.10.12