metadata

tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

transformers_issues_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("mark230271/transformers_issues_topics")

topic_model.get_topic_info()

Topic overview

Number of topics: 30
Number of training documents: 9000

Click here for an overview of all topics.

Topic ID	Topic Keywords	Topic Frequency	Label
-1	tokenizer - bert - tokenizers - pytorch - tensorflow	11	-1_tokenizer_bert_tokenizers_pytorch
0	tokenizer - tokenizers - tokenization - berttokenizer - bart	2376	0_tokenizer_tokenizers_tokenization_berttokenizer
1	cuda - gpt2 - gpt - gpus - gpu	1879	1_cuda_gpt2_gpt_gpus
2	modelcard - modelcards - card - model - models	735	2_modelcard_modelcards_card_model
3	transformerscli - transformers - transformer - transformerxl - importerror	412	3_transformerscli_transformers_transformer_transformerxl
4	typeerror - attributeerror - valueerror - error - errors	385	4_typeerror_attributeerror_valueerror_error
5	trainertrain - trainer - trainerevaluate - trainers - training	330	5_trainertrain_trainer_trainerevaluate_trainers
6	seq2seq - seq2seqtrainer - s2s - runseq2seq - seq2seqdataset	319	6_seq2seq_seq2seqtrainer_s2s_runseq2seq
7	typos - typo - fix - correction - fixed	306	7_typos_typo_fix_correction
8	ci - testing - test - tests - circleci	282	8_ci_testing_test_tests
9	readmemd - readmetxt - readme - file - camembertbasereadmemd	255	9_readmemd_readmetxt_readme_file
10	t5 - t5model - tf - t5base - t5large	255	10_t5_t5model_tf_t5base
11	generationbeamsearchpy - beamsearch - groupbeamsearch - beam - search	218	11_generationbeamsearchpy_beamsearch_groupbeamsearch_beam
12	flax - distilbertmodel - flaubert - deberta - model	185	12_flax_distilbertmodel_flaubert_deberta
13	ner - pipeline - pipelines - nerpipeline - fillmaskpipeline	177	13_ner_pipeline_pipelines_nerpipeline
14	questionansweringpipeline - tfalbertforquestionanswering - questionanswering - distilbertforquestionanswering - answering	161	14_questionansweringpipeline_tfalbertforquestionanswering_questionanswering_distilbertforquestionanswering
15	huggingfacetransformers - huggingface - hugging - gluepy - gluebenchmarkcom	133	15_huggingfacetransformers_huggingface_hugging_gluepy
16	onnx - onnxonnxruntime - onnxexport - 04onnxexport - 04onnxexportipynb	130	16_onnx_onnxonnxruntime_onnxexport_04onnxexport
17	labelsmoothednllloss - labelsmoothingfactor - label - labels - labelsmoothing	96	17_labelsmoothednllloss_labelsmoothingfactor_label_labels
18	longformer - longformers - longform - longformerlayer - longformermodel	73	18_longformer_longformers_longform_longformerlayer
19	configpath - configs - config - configuration - modelconfigs	59	19_configpath_configs_config_configuration
20	wandbproject - wandb - sagemaker - sagemakertrainer - wandbcallback	45	20_wandbproject_wandb_sagemaker_sagemakertrainer
21	cachedir - cache - cachedpath - caching - cached	33	21_cachedir_cache_cachedpath_caching
22	notebook - notebooks - community - colab - t5	33	22_notebook_notebooks_community_colab
23	electra - electrapretrainedmodel - electraformaskedlm - electraformultiplechoice - electrafortokenclassification	30	23_electra_electrapretrainedmodel_electraformaskedlm_electraformultiplechoice
24	layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf	24	24_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased
25	isort - blackisortflake8 - github - repo - version	18	25_isort_blackisortflake8_github_repo
26	pplm - pr - deprecated - variable - ppl	14	26_pplm_pr_deprecated_variable
27	indexerror - index - missingindex - indices - runtimeerror	14	27_indexerror_index_missingindex_indices
28	ga - fork - forks - forked - push	12	28_ga_fork_forks_forked

Training hyperparameters

calculate_probabilities: False
language: english
low_memory: False
min_topic_size: 10
n_gram_range: (1, 1)
nr_topics: 30
seed_topic_list: None
top_n_words: 10
verbose: True
zeroshot_min_similarity: 0.7
zeroshot_topic_list: None

Framework versions

Numpy: 1.25.2
HDBSCAN: 0.8.33
UMAP: 0.5.6
Pandas: 2.0.3
Scikit-Learn: 1.2.2
Sentence-transformers: 2.6.1
Transformers: 4.38.2
Numba: 0.58.1
Plotly: 5.15.0
Python: 3.10.12