Edit model card

topic_model_general_normal_april8

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/topic_model_general_normal_april8")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 80
  • Number of training documents: 6795
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 models - language - llms - language models - chatgpt 11 -1_models_language_llms_language models
0 translation - language - models - data - generation 2010 0_translation_language_models_data
1 visual - multimodal - image - images - video 510 1_visual_multimodal_image_images
2 reasoning - math - cot - mathematical - problems 432 2_reasoning_math_cot_mathematical
3 attacks - attack - adversarial - safety - jailbreak 340 3_attacks_attack_adversarial_safety
4 medical - clinical - biomedical - health - healthcare 318 4_medical_clinical_biomedical_health
5 code - code generation - generation - programming - software 303 5_code_code generation_generation_programming
6 students - education - ai - chatgpt - student 153 6_students_education_ai_chatgpt
7 robot - planning - robots - navigation - robotic 110 7_robot_planning_robots_navigation
8 dialogue - taskoriented - dialog - dialogue systems - systems 107 8_dialogue_taskoriented_dialog_dialogue systems
9 knowledge - question - answering - question answering - kgs 97 9_knowledge_question_answering_question answering
10 financial - sentiment - stock - market - investment 78 10_financial_sentiment_stock_market
11 bias - gender - biases - gender bias - fairness 78 11_bias_gender_biases_gender bias
12 emotion - emotional - empathetic - mental health - affective 77 12_emotion_emotional_empathetic_mental health
13 privacy - private - federated - data - attack 76 13_privacy_private_federated_data
14 text - detection - texts - aigenerated - machinegenerated 75 14_text_detection_texts_aigenerated
15 radiology - medical - reports - image - radiology reports 75 15_radiology_medical_reports_image
16 training - parallelism - gpu - memory - hardware 71 16_training_parallelism_gpu_memory
17 summarization - summaries - abstractive - summary - text summarization 70 17_summarization_summaries_abstractive_summary
18 game - games - agents - social - llm agents 69 18_game_games_agents_social
19 quantization - quantized - weights - memory - compression 66 19_quantization_quantized_weights_memory
20 sql - texttosql - table - database - tabular 62 20_sql_texttosql_table_database
21 retrieval - ranking - rag - reranking - retrievalaugmented 61 21_retrieval_ranking_rag_reranking
22 lora - attention - lowrank - finetuning - memory 59 22_lora_attention_lowrank_finetuning
23 legal - patent - claim - court - law 58 23_legal_patent_claim_court
24 alignment - preference - reward - rlhf - preferences 58 24_alignment_preference_reward_rlhf
25 recommendation - recommender - recommendations - recommender systems - user 56 25_recommendation_recommender_recommendations_recommender systems
26 transformer - transformers - attention - layers - layer 55 26_transformer_transformers_attention_layers
27 tom - cognitive - analogical - analogies - human 52 27_tom_cognitive_analogical_analogies
28 vulnerability - vulnerabilities - code - security - smart 48 28_vulnerability_vulnerabilities_code_security
29 materials - chemistry - materials science - chemical - molecular 48 29_materials_chemistry_materials science_chemical
30 agent - agents - rl - environments - language agents 47 30_agent_agents_rl_environments
31 repair - bugs - bug - program repair - apr 43 31_repair_bugs_bug_program repair
32 graph - graphs - graph reasoning - graph neural - graph data 43 32_graph_graphs_graph reasoning_graph neural
33 speech - asr - audio - speech recognition - recognition 42 33_speech_asr_audio_speech recognition
34 ai - ethical - regulation - risks - risk 41 34_ai_ethical_regulation_risks
35 personality - traits - personality traits - personas - personalities 41 35_personality_traits_personality traits_personas
36 context - context window - window - length - long 36 36_context_context window_window_length
37 chatgpt - research - writing - ai - academic 34 37_chatgpt_research_writing_ai
38 incontext - demonstrations - icl - incontext learning - learning 33 38_incontext_demonstrations_icl_incontext learning
39 sentiment - sentiment analysis - analysis - aspectbased - polarity 32 39_sentiment_sentiment analysis_analysis_aspectbased
40 cultural - opinions - political - survey - values 30 40_cultural_opinions_political_survey
41 tool - tools - apis - api - llms 29 41_tool_tools_apis_api
42 hallucinations - hallucination - hallucination detection - detection - llms 29 42_hallucinations_hallucination_hallucination detection_detection
43 creative - ideas - ai - creativity - storytelling 28 43_creative_ideas_ai_creativity
44 music - musical - audio - lyrics - song 28 44_music_musical_audio_lyrics
45 scaling - scaling laws - laws - training - model 27 45_scaling_scaling laws_laws_training
46 physics - students - chatgpt - education - responses 26 46_physics_students_chatgpt_education
47 correction - grammatical - gec - error - error correction 26 47_correction_grammatical_gec_error
48 test - unit - tests - test generation - test cases 23 48_test_unit_tests_test generation
49 pruning - sparsity - structured pruning - structured - weights 23 49_pruning_sparsity_structured pruning_structured
50 commonsense - commonsense knowledge - knowledge - commonsense question answering - commonsense question 21 50_commonsense_commonsense knowledge_knowledge_commonsense question answering
51 distillation - teacher - student - kd - knowledge distillation 20 51_distillation_teacher_student_kd
52 visualization - visualizations - data visualization - natural - natural language 20 52_visualization_visualizations_data visualization_natural
53 hallucination - hallucinations - lvlms - mllms - visual 20 53_hallucination_hallucinations_lvlms_mllms
54 adversarial - vlms - attacks - attack - adversarial examples 20 54_adversarial_vlms_attacks_attack
55 verilog - design - hardware - hardware design - rtl 18 55_verilog_design_hardware_hardware design
56 spatial - geospatial - geographic - location - populations 18 56_spatial_geospatial_geographic_location
57 intent - intent detection - slot - detection - slot filling 18 57_intent_intent detection_slot_detection
58 prompts - prompt - performance - negated - pseudocode 18 58_prompts_prompt_performance_negated
59 brain - fmri - neural - activity - eeg 17 59_brain_fmri_neural_activity
60 watermarking - copyright - protection - text - model 16 60_watermarking_copyright_protection_text
61 public - social - media - early - ai 16 61_public_social_media_early
62 ai - productivity - chatbots - chatgpt - economy 15 62_ai_productivity_chatbots_chatgpt
63 poetry - poems - poetry generation - lyrics - poem 15 63_poetry_poems_poetry generation_lyrics
64 geoscience - astronomy - scientific - astronomical - galactica 15 64_geoscience_astronomy_scientific_astronomical
65 editing - knowledge editing - knowledge - model editing - editing methods 14 65_editing_knowledge editing_knowledge_model editing
66 argument - arguments - argumentation - fallacy - fallacies 14 66_argument_arguments_argumentation_fallacy
67 mobile - wireless - devices - aigc - network 14 67_mobile_wireless_devices_aigc
68 design - bid - 3d - designs - generative 14 68_design_bid_3d_designs
69 simplification - text simplification - text - sentence - readability 14 69_simplification_text simplification_text_sentence
70 urban - traffic - transportation - foundation models - foundation 13 70_urban_traffic_transportation_foundation models
71 log - anomaly - root - anomaly detection - cloud 13 71_log_anomaly_root_anomaly detection
72 forgetting - catastrophic forgetting - catastrophic - continual - finetuning 13 72_forgetting_catastrophic forgetting_catastrophic_continual
73 scientific - papers - review - gpt4 - feedback 13 73_scientific_papers_review_gpt4
74 causal - causality - causal discovery - causal inference - causal reasoning 13 74_causal_causality_causal discovery_causal inference
75 product - ecommerce - attribute - extraction - product descriptions 13 75_product_ecommerce_attribute_extraction
76 optimizers - adam - deep - training - networks 12 76_optimizers_adam_deep_training
77 chinese - questions - subjects - school - ceval 12 77_chinese_questions_subjects_school
78 speculative - decoding - draft - speculative decoding - draft model 12 78_speculative_decoding_draft_speculative decoding

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: auto
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.25.2
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.6
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.6.1
  • Transformers: 4.38.2
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
3
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.