Add BERTopic model

a8060f3 over 1 year ago

5.09 kB


	---
	tags:
	- bertopic
	library_name: bertopic
	pipeline_tag: text-classification
	---

	# transformers_issues_topics

	This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
	BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

	## Usage

	To use this model, please install BERTopic:

	```
	pip install -U bertopic
	```

	You can use the model as follows:

	```python
	from bertopic import BERTopic
	topic_model = BERTopic.load("lbhjvh14/transformers_issues_topics")

	topic_model.get_topic_info()
	```

	## Topic overview

	* Number of topics: 30
	* Number of training documents: 9000

	<details>
	<summary>Click here for an overview of all topics.</summary>

	\| Topic ID \| Topic Keywords \| Topic Frequency \| Label \|
	\|----------\|----------------\|-----------------\|-------\|
	\| -1 \| pretrained - tokenizer - tensorflow - tokenizers - tf \| 11 \| -1_pretrained_tokenizer_tensorflow_tokenizers \|
	\| 0 \| tokenizer - tokenizers - tokenization - tokenize - token \| 2407 \| 0_tokenizer_tokenizers_tokenization_tokenize \|
	\| 1 \| cuda - memory - tensorflow - pytorch - gpu \| 1379 \| 1_cuda_memory_tensorflow_pytorch \|
	\| 2 \| longformer - longformers - longformertokenizerfast - longformerformultiplechoice - tf \| 791 \| 2_longformer_longformers_longformertokenizerfast_longformerformultiplechoice \|
	\| 3 \| modelcard - modelcards - card - model - cards \| 510 \| 3_modelcard_modelcards_card_model \|
	\| 4 \| summarization - summaries - summary - sentences - text \| 431 \| 4_summarization_summaries_summary_sentences \|
	\| 5 \| s2s - seq2seq - runseq2seq - eval - examplesseq2seq \| 405 \| 5_s2s_seq2seq_runseq2seq_eval \|
	\| 6 \| squaddataset - attributeerror - squadpy - valueerror - modulenotfounderror \| 381 \| 6_squaddataset_attributeerror_squadpy_valueerror \|
	\| 7 \| typos - typo - doc - docstring - fix \| 324 \| 7_typos_typo_doc_docstring \|
	\| 8 \| readmemd - readmetxt - readme - modelcard - file \| 299 \| 8_readmemd_readmetxt_readme_modelcard \|
	\| 9 \| gpt2 - gpt2xl - gpt - gpt2tokenizer - gpt3 \| 261 \| 9_gpt2_gpt2xl_gpt_gpt2tokenizer \|
	\| 10 \| rag - ragtokenforgeneration - ragsequenceforgeneration - tokenizer - gluepy \| 256 \| 10_rag_ragtokenforgeneration_ragsequenceforgeneration_tokenizer \|
	\| 11 \| transformerscli - importerror - transformers - transformer - transformerxl \| 232 \| 11_transformerscli_importerror_transformers_transformer \|
	\| 12 \| ner - pipeline - pipelines - pipelinespy - nerpipeline \| 196 \| 12_ner_pipeline_pipelines_pipelinespy \|
	\| 13 \| testing - tests - test - installationtest - speedup \| 189 \| 13_testing_tests_test_installationtest \|
	\| 14 \| checkpoint - trainertrain - checkpoints - checkpointing - trainersavecheckpoint \| 162 \| 14_checkpoint_trainertrain_checkpoints_checkpointing \|
	\| 15 \| flax - flaxelectraformaskedlm - flaxelectraforpretraining - flaxjax - flaxelectramodel \| 119 \| 15_flax_flaxelectraformaskedlm_flaxelectraforpretraining_flaxjax \|
	\| 16 \| generationbeamsearchpy - generatebeamsearch - beamsearch - nonbeamsearch - beam \| 109 \| 16_generationbeamsearchpy_generatebeamsearch_beamsearch_nonbeamsearch \|
	\| 17 \| onnxonnxruntime - onnx - onnxexport - 04onnxexport - 04onnxexportipynb \| 99 \| 17_onnxonnxruntime_onnx_onnxexport_04onnxexport \|
	\| 18 \| labelsmoothednllloss - labelsmoothingfactor - label - labels - labelsmoothing \| 86 \| 18_labelsmoothednllloss_labelsmoothingfactor_label_labels \|
	\| 19 \| cachedir - cache - cachedpath - cached - caching \| 77 \| 19_cachedir_cache_cachedpath_cached \|
	\| 20 \| wav2vec2 - wav2vec - wav2vec20 - wav2vec2forctc - wav2vec2xlrswav2vec2 \| 61 \| 20_wav2vec2_wav2vec_wav2vec20_wav2vec2forctc \|
	\| 21 \| notebook - notebooks - community - colab - t5 \| 55 \| 21_notebook_notebooks_community_colab \|
	\| 22 \| wandbproject - wandb - wandbcallback - wandbdisabled - wandbdisabledtrue \| 39 \| 22_wandbproject_wandb_wandbcallback_wandbdisabled \|
	\| 23 \| electra - electrapretrainedmodel - electraformaskedlm - electraformultiplechoice - electrafortokenclassification \| 38 \| 23_electra_electrapretrainedmodel_electraformaskedlm_electraformultiplechoice \|
	\| 24 \| layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf \| 24 \| 24_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased \|
	\| 25 \| isort - blackisortflake8 - dependencies - github - matplotlib \| 18 \| 25_isort_blackisortflake8_dependencies_github \|
	\| 26 \| pplm - pr - deprecated - variable - ppl \| 16 \| 26_pplm_pr_deprecated_variable \|
	\| 27 \| ga - fork - forks - forked - push \| 14 \| 27_ga_fork_forks_forked \|
	\| 28 \| indexerror - runtimeerror - index - indices - missingindex \| 11 \| 28_indexerror_runtimeerror_index_indices \|

	</details>

	## Training hyperparameters

	* calculate_probabilities: False
	* language: english
	* low_memory: False
	* min_topic_size: 10
	* n_gram_range: (1, 1)
	* nr_topics: 30
	* seed_topic_list: None
	* top_n_words: 10
	* verbose: True

	## Framework versions

	* Numpy: 1.23.5
	* HDBSCAN: 0.8.33
	* UMAP: 0.5.4
	* Pandas: 1.5.3
	* Scikit-Learn: 1.2.2
	* Sentence-transformers: 2.2.2
	* Transformers: 4.34.0
	* Numba: 0.56.4
	* Plotly: 5.15.0
	* Python: 3.10.12