Add BERTopic model

b53c390 verified 7 months ago

5.04 kB


	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	---

	# transformers_issues_topics

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("mark230271/transformers_issues_topics")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 30
	* Number of training documents: 9000

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| tokenizer - bert - tokenizers - pytorch - tensorflow \| 11 \| -1_tokenizer_bert_tokenizers_pytorch \|
	\| 0 \| tokenizer - tokenizers - tokenization - berttokenizer - bart \| 2376 \| 0_tokenizer_tokenizers_tokenization_berttokenizer \|
	\| 1 \| cuda - gpt2 - gpt - gpus - gpu \| 1879 \| 1_cuda_gpt2_gpt_gpus \|
	\| 2 \| modelcard - modelcards - card - model - models \| 735 \| 2_modelcard_modelcards_card_model \|
	\| 3 \| transformerscli - transformers - transformer - transformerxl - importerror \| 412 \| 3_transformerscli_transformers_transformer_transformerxl \|
	\| 4 \| typeerror - attributeerror - valueerror - error - errors \| 385 \| 4_typeerror_attributeerror_valueerror_error \|
	\| 5 \| trainertrain - trainer - trainerevaluate - trainers - training \| 330 \| 5_trainertrain_trainer_trainerevaluate_trainers \|
	\| 6 \| seq2seq - seq2seqtrainer - s2s - runseq2seq - seq2seqdataset \| 319 \| 6_seq2seq_seq2seqtrainer_s2s_runseq2seq \|
	\| 7 \| typos - typo - fix - correction - fixed \| 306 \| 7_typos_typo_fix_correction \|
	\| 8 \| ci - testing - test - tests - circleci \| 282 \| 8_ci_testing_test_tests \|
	\| 9 \| readmemd - readmetxt - readme - file - camembertbasereadmemd \| 255 \| 9_readmemd_readmetxt_readme_file \|
	\| 10 \| t5 - t5model - tf - t5base - t5large \| 255 \| 10_t5_t5model_tf_t5base \|
	\| 11 \| generationbeamsearchpy - beamsearch - groupbeamsearch - beam - search \| 218 \| 11_generationbeamsearchpy_beamsearch_groupbeamsearch_beam \|
	\| 12 \| flax - distilbertmodel - flaubert - deberta - model \| 185 \| 12_flax_distilbertmodel_flaubert_deberta \|
	\| 13 \| ner - pipeline - pipelines - nerpipeline - fillmaskpipeline \| 177 \| 13_ner_pipeline_pipelines_nerpipeline \|
	\| 14 \| questionansweringpipeline - tfalbertforquestionanswering - questionanswering - distilbertforquestionanswering - answering \| 161 \| 14_questionansweringpipeline_tfalbertforquestionanswering_questionanswering_distilbertforquestionanswering \|
	\| 15 \| huggingfacetransformers - huggingface - hugging - gluepy - gluebenchmarkcom \| 133 \| 15_huggingfacetransformers_huggingface_hugging_gluepy \|
	\| 16 \| onnx - onnxonnxruntime - onnxexport - 04onnxexport - 04onnxexportipynb \| 130 \| 16_onnx_onnxonnxruntime_onnxexport_04onnxexport \|
	\| 17 \| labelsmoothednllloss - labelsmoothingfactor - label - labels - labelsmoothing \| 96 \| 17_labelsmoothednllloss_labelsmoothingfactor_label_labels \|
	\| 18 \| longformer - longformers - longform - longformerlayer - longformermodel \| 73 \| 18_longformer_longformers_longform_longformerlayer \|
	\| 19 \| configpath - configs - config - configuration - modelconfigs \| 59 \| 19_configpath_configs_config_configuration \|
	\| 20 \| wandbproject - wandb - sagemaker - sagemakertrainer - wandbcallback \| 45 \| 20_wandbproject_wandb_sagemaker_sagemakertrainer \|
	\| 21 \| cachedir - cache - cachedpath - caching - cached \| 33 \| 21_cachedir_cache_cachedpath_caching \|
	\| 22 \| notebook - notebooks - community - colab - t5 \| 33 \| 22_notebook_notebooks_community_colab \|
	\| 23 \| electra - electrapretrainedmodel - electraformaskedlm - electraformultiplechoice - electrafortokenclassification \| 30 \| 23_electra_electrapretrainedmodel_electraformaskedlm_electraformultiplechoice \|
	\| 24 \| layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf \| 24 \| 24_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased \|
	\| 25 \| isort - blackisortflake8 - github - repo - version \| 18 \| 25_isort_blackisortflake8_github_repo \|
	\| 26 \| pplm - pr - deprecated - variable - ppl \| 14 \| 26_pplm_pr_deprecated_variable \|
	\| 27 \| indexerror - index - missingindex - indices - runtimeerror \| 14 \| 27_indexerror_index_missingindex_indices \|
	\| 28 \| ga - fork - forks - forked - push \| 12 \| 28_ga_fork_forks_forked \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: False
	* language: english
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: 30
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True
	* zeroshot_min_similarity: 0.7
	* zeroshot_topic_list: None

	## Framework versions

	* Numpy: 1.25.2
	* HDBSCAN: 0.8.33
	* UMAP: 0.5.6
	* Pandas: 2.0.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.6.1
	* Transformers: 4.38.2
	* Numba: 0.58.1
	* Plotly: 5.15.0
	* Python: 3.10.12