Edit model card

topic_model_general_auto_april8

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/topic_model_general_auto_april8")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 113
  • Number of training documents: 6795
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 models - language - llms - language models - model 10 -1_models_language_llms_language models
0 visual - multimodal - image - images - video 1955 0_visual_multimodal_image_images
1 reasoning - mathematical - cot - math - problems 429 1_reasoning_mathematical_cot_math
2 students - education - chatgpt - student - ai 315 2_students_education_chatgpt_student
3 medical - clinical - biomedical - healthcare - notes 261 3_medical_clinical_biomedical_healthcare
4 translation - languages - machine translation - multilingual - machine 215 4_translation_languages_machine translation_multilingual
5 code - code generation - generation - programming - python 156 5_code_code generation_generation_programming
6 generation - story - text - text generation - gpt2 131 6_generation_story_text_text generation
7 rlhf - reward - alignment - preference - feedback 85 7_rlhf_reward_alignment_preference
8 financial - sentiment - stock - market - investment 78 8_financial_sentiment_stock_market
9 bias - gender - biases - gender bias - fairness 77 9_bias_gender_biases_gender bias
10 summarization - summaries - abstractive - text summarization - summary 77 10_summarization_summaries_abstractive_text summarization
11 emotion - emotional - empathetic - emotions - affective 74 11_emotion_emotional_empathetic_emotions
12 radiology - medical - reports - radiology reports - image 74 12_radiology_medical_reports_radiology reports
13 fewshot - zeroshot - learning - augmentation - data 69 13_fewshot_zeroshot_learning_augmentation
14 game - games - agents - negotiation - llm agents 69 14_game_games_agents_negotiation
15 dialogue - taskoriented - dialog - dialogue systems - systems 68 15_dialogue_taskoriented_dialog_dialogue systems
16 text - detection - texts - aigenerated - detectors 62 16_text_detection_texts_aigenerated
17 news - misinformation - fake - detection - fake news 61 17_news_misinformation_fake_detection
18 quantization - quantized - weights - 4bit - memory 61 18_quantization_quantized_weights_4bit
19 adversarial - attack - attacks - backdoor - adversarial examples 60 19_adversarial_attack_attacks_backdoor
20 privacy - private - federated - privacypreserving - pii 59 20_privacy_private_federated_privacypreserving
21 retrieval - ranking - rag - reranking - retrievalaugmented 58 21_retrieval_ranking_rag_reranking
22 legal - patent - court - claim - law 58 22_legal_patent_court_claim
23 code - software - developers - commit - code generation 57 23_code_software_developers_commit
24 word - representations - negation - linguistic - sentence 56 24_word_representations_negation_linguistic
25 recommendation - recommender - recommendations - recommender systems - user 55 25_recommendation_recommender_recommendations_recommender systems
26 instruction - instruction tuning - tuning - instructions - data 54 26_instruction_instruction tuning_tuning_instructions
27 pretraining - pretrained - seq2seq - tasks - masked 54 27_pretraining_pretrained_seq2seq_tasks
28 vulnerability - vulnerabilities - security - code - smart 54 28_vulnerability_vulnerabilities_security_code
29 transformer - transformers - layers - layer - attention 48 29_transformer_transformers_layers_layer
30 jailbreak - attacks - jailbreaking - attack - safety 44 30_jailbreak_attacks_jailbreaking_attack
31 ai - regulation - ethical - risk - regulatory 43 31_ai_regulation_ethical_risk
32 materials - chemistry - chemical - molecular - materials science 42 32_materials_chemistry_chemical_molecular
33 repair - bugs - bug - program repair - apr 42 33_repair_bugs_bug_program repair
34 graph - graphs - graph reasoning - graph neural - graph data 41 34_graph_graphs_graph reasoning_graph neural
35 speech - asr - speech recognition - audio - recognition 41 35_speech_asr_speech recognition_audio
36 evaluation - nlg - metrics - human - text 40 36_evaluation_nlg_metrics_human
37 personality - traits - personality traits - psychological - personas 38 37_personality_traits_personality traits_psychological
38 agent - agents - language agents - environments - decisionmaking 37 38_agent_agents_language agents_environments
39 texttosql - sql - database - spider - query 36 39_texttosql_sql_database_spider
40 tom - cognitive - mind - theory mind - humans 34 40_tom_cognitive_mind_theory mind
41 hate - hate speech - speech - offensive - hateful 34 41_hate_hate speech_speech_offensive
42 question - qa - answering - question answering - questions 34 42_question_qa_answering_question answering
43 incontext - icl - demonstrations - incontext learning - learning 33 43_incontext_icl_demonstrations_incontext learning
44 navigation - robot - manipulation - embodied - robots 33 44_navigation_robot_manipulation_embodied
45 hallucinations - hallucination - hallucination detection - detection - llms 31 45_hallucinations_hallucination_hallucination detection_detection
46 commonsense - commonsense knowledge - knowledge - commonsense reasoning - commonsense question answering 31 46_commonsense_commonsense knowledge_knowledge_commonsense reasoning
47 tool - tools - apis - api - tooluse 31 47_tool_tools_apis_api
48 parallelism - training - distributed - distributed training - network 30 48_parallelism_training_distributed_distributed training
49 brain - neural - gpt2 - circuit - attention 30 49_brain_neural_gpt2_circuit
50 context - context window - window - length - extrapolation 29 50_context_context window_window_length
51 knowledge - knowledge graph - kgs - wikidata - graph 29 51_knowledge_knowledge graph_kgs_wikidata
52 chatbots - search - chatgpt - technology - chat 28 52_chatbots_search_chatgpt_technology
53 cultural - political - opinions - values - survey 28 53_cultural_political_opinions_values
54 sentiment - sentiment analysis - analysis - aspectbased - polarity 28 54_sentiment_sentiment analysis_analysis_aspectbased
55 research - writing - ai - scientific - chatgpt 28 55_research_writing_ai_scientific
56 music - musical - audio - lyrics - sounds 28 56_music_musical_audio_lyrics
57 scaling - training - scaling laws - laws - emergent abilities 28 57_scaling_training_scaling laws_laws
58 explanations - counterfactual - explanation - counterfactuals - natural language explanations 27 58_explanations_counterfactual_explanation_counterfactuals
59 lora - lowrank - finetuning - adaptation - peft 27 59_lora_lowrank_finetuning_adaptation
60 safety - unsafe - harmful - safety alignment - 2chat 26 60_safety_unsafe_harmful_safety alignment
61 cybersecurity - cyber - security - genai - threat 26 61_cybersecurity_cyber_security_genai
62 visualization - visualizations - data visualization - chart - natural language 25 62_visualization_visualizations_data visualization_chart
63 attention - memory - matrix - linear - kv 23 63_attention_memory_matrix_linear
64 correction - gec - grammatical - error - error correction 23 64_correction_gec_grammatical_error
65 test - unit - tests - test generation - test cases 22 65_test_unit_tests_test generation
66 entity - relation - ner - extraction - relation extraction 22 66_entity_relation_ner_extraction
67 prompt - prompts - tuning - prompt tuning - optimization 22 67_prompt_prompts_tuning_prompt tuning
68 distillation - teacher - student - kd - student model 22 68_distillation_teacher_student_kd
69 pruning - sparsity - structured pruning - structured - weights 21 69_pruning_sparsity_structured pruning_structured
70 hallucination - hallucinations - lvlms - mllms - visual 21 70_hallucination_hallucinations_lvlms_mllms
71 ideas - creative - ai - creativity - fictional 21 71_ideas_creative_ai_creativity
72 mental - mental health - health - depression - social media 21 72_mental_mental health_health_depression
73 adversarial - vlms - attacks - attack - adversarial examples 20 73_adversarial_vlms_attacks_attack
74 confidence - calibration - uncertainty - probabilities - confidence scores 19 74_confidence_calibration_uncertainty_probabilities
75 crosslingual - multilingual - languages - english - transfer 19 75_crosslingual_multilingual_languages_english
76 verilog - design - hardware - hardware design - rtl 18 76_verilog_design_hardware_hardware design
77 intent - intent detection - slot - slot filling - detection 17 77_intent_intent detection_slot_slot filling
78 arabic - hebrew - cultural - nlp - diacritization 17 78_arabic_hebrew_cultural_nlp
79 watermarking - watermark - copyright - protection - ip 16 79_watermarking_watermark_copyright_protection
80 robot - robots - dialogue - round - humanrobot 16 80_robot_robots_dialogue_round
81 poetry - poems - poetry generation - lyrics - generation 16 81_poetry_poems_poetry generation_lyrics
82 table - tabular - tables - tabular data - data 16 82_table_tabular_tables_tabular data
83 spatial - geospatial - gis - geographic - location 15 83_spatial_geospatial_gis_geographic
84 product - ecommerce - attribute - extraction - product descriptions 15 84_product_ecommerce_attribute_extraction
85 geoscience - astronomy - scientific - astronomical - galactica 15 85_geoscience_astronomy_scientific_astronomical
86 phishing - emails - phishing emails - email - phishing attacks 15 86_phishing_emails_phishing emails_email
87 ai - generative ai - workers - generative - labor 14 87_ai_generative ai_workers_generative
88 planning - robotic - robot - robogpt - task planning 14 88_planning_robotic_robot_robogpt
89 mobile - wireless - edge - devices - aigc 14 89_mobile_wireless_edge_devices
90 simplification - text simplification - sentence - text - readability 14 90_simplification_text simplification_sentence_text
91 editing - knowledge editing - model editing - knowledge - editing methods 14 91_editing_knowledge editing_model editing_knowledge
92 annotation - data annotation - metadata - annotators - data 14 92_annotation_data annotation_metadata_annotators
93 gpu - hardware - communication - memory - accelerators 14 93_gpu_hardware_communication_memory
94 argument - arguments - argumentation - fallacy - fallacies 14 94_argument_arguments_argumentation_fallacy
95 toxicity - toxic - detoxification - content - toxic content 14 95_toxicity_toxic_detoxification_content
96 causal - causal reasoning - causality - causal discovery - causal inference 14 96_causal_causal reasoning_causality_causal discovery
97 design - bid - 3d - designs - generative 14 97_design_bid_3d_designs
98 chinese - questions - subjects - school - ceval 14 98_chinese_questions_subjects_school
99 scientific - papers - review - feedback - reviews 13 99_scientific_papers_review_feedback
100 urban - traffic - transportation - foundation models - foundation 13 100_urban_traffic_transportation_foundation models
101 humor - sarcasm - jokes - sarcasm detection - funny 13 101_humor_sarcasm_jokes_sarcasm detection
102 analogical - analogies - analogy - analogical reasoning - metaphor 12 102_analogical_analogies_analogy_analogical reasoning
103 public - early - sentiments - media - topics 12 103_public_early_sentiments_media
104 optimizers - adam - deep - networks - training 12 104_optimizers_adam_deep_networks
105 log - root - cloud - anomaly detection - anomaly 12 105_log_root_cloud_anomaly detection
106 dialogue - norm - norms - conversations - persona 12 106_dialogue_norm_norms_conversations
107 speculative - decoding - draft - speculative decoding - draft model 11 107_speculative_decoding_draft_speculative decoding
108 protein - sequences - proteins - bioinformatics - protein sequence 11 108_protein_sequences_proteins_bioinformatics
109 forgetting - catastrophic forgetting - catastrophic - continual - continual learning 11 109_forgetting_catastrophic forgetting_catastrophic_continual
110 software - software engineering - software using - chatgpt - software testing 11 110_software_software engineering_software using_chatgpt
111 verification - sva - configuration - proof - verified 10 111_verification_sva_configuration_proof

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.25.2
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.6
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.6.1
  • Transformers: 4.38.2
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.