Edit model card

general_nlp_research_paper

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("Thang203/general_nlp_research_paper")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 165
  • Number of training documents: 11000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 language - models - model - data - translation 10 -1_language_models_model_data
0 question - answer - questions - answering - question answering 3488 0_question_answer_questions_answering
1 speech - speech recognition - acoustic - recognition - asr 513 1_speech_speech recognition_acoustic_recognition
2 summarization - summaries - abstractive - summary - extractive 345 2_summarization_summaries_abstractive_summary
3 clinical - medical - biomedical - extraction - notes 337 3_clinical_medical_biomedical_extraction
4 translation - machine translation - parallel - machine - nmt 258 4_translation_machine translation_parallel_machine
5 emotion - emotions - emotional - emotion recognition - affective 211 5_emotion_emotions_emotional_emotion recognition
6 word - embeddings - word embeddings - similarity - vector 164 6_word_embeddings_word embeddings_similarity
7 bert - probing - tasks - pretraining - pretrained 145 7_bert_probing_tasks_pretraining
8 relation - relation extraction - extraction - relations - distant 138 8_relation_relation extraction_extraction_relations
9 hate - hate speech - offensive - detection - speech 134 9_hate_hate speech_offensive_detection
10 arabic - sanskrit - kurdish - transliteration - rules 118 10_arabic_sanskrit_kurdish_transliteration
11 aspect - sentiment - sentiment analysis - aspectbased sentiment - aspectbased 118 11_aspect_sentiment_sentiment analysis_aspectbased sentiment
12 morphological - inflection - languages - morphology - morphological analysis 112 12_morphological_inflection_languages_morphology
13 ner - named entity - named - entity recognition - named entity recognition 107 13_ner_named entity_named_entity recognition
14 multimodal - image - visual - captions - images 101 14_multimodal_image_visual_captions
15 discourse - discourse relation - discourse parsing - implicit discourse - discourse relations 98 15_discourse_discourse relation_discourse parsing_implicit discourse
16 chinese - segmentation - word segmentation - chinese word - chinese word segmentation 89 16_chinese_segmentation_word segmentation_chinese word
17 crosslingual - bilingual - embeddings - crosslingual word - word embeddings 84 17_crosslingual_bilingual_embeddings_crosslingual word
18 entropy - law - languages - script - frequency 79 18_entropy_law_languages_script
19 argument - argumentation - arguments - argumentative - mining 77 19_argument_argumentation_arguments_argumentative
20 nmt - neural machine - neural machine translation - translation - machine translation 77 20_nmt_neural machine_neural machine translation_translation
21 parsing - dependency - dependency parsing - parser - transitionbased 76 21_parsing_dependency_dependency parsing_parser
22 syntactic - rnns - grammatical - language models - agreement 71 22_syntactic_rnns_grammatical_language models
23 generation - datatotext - text generation - datatotext generation - text 71 23_generation_datatotext_text generation_datatotext generation
24 topic - topics - topic models - topic modeling - lda 71 24_topic_topics_topic models_topic modeling
25 knowledge - knowledge graph - entities - relation - graph 68 25_knowledge_knowledge graph_entities_relation
26 gender - bias - gender bias - biases - embeddings 66 26_gender_bias_gender bias_biases
27 story - stories - story generation - narrative - plot 65 27_story_stories_story generation_narrative
28 dialogue - dialog - user - taskoriented - agent 65 28_dialogue_dialog_user_taskoriented
29 transformer - attention - selfattention - heads - layers 65 29_transformer_attention_selfattention_heads
30 srl - semantic role - role labeling - semantic role labeling - role 64 30_srl_semantic role_role labeling_semantic role labeling
31 change - semantic change - diachronic - lexical semantic - semantic 64 31_change_semantic change_diachronic_lexical semantic
32 sense - wsd - disambiguation - word sense - sense disambiguation 64 32_sense_wsd_disambiguation_word sense
33 paraphrase - paraphrases - paraphrase generation - paraphrasing - paraphrase identification 63 33_paraphrase_paraphrases_paraphrase generation_paraphrasing
34 linking - entity linking - entity - el - entities 62 34_linking_entity linking_entity_el
35 authorship - attribution - authorship attribution - authors - stylistic 60 35_authorship_attribution_authorship attribution_authors
36 tracking - state tracking - dialogue state - state - dialogue 54 36_tracking_state tracking_dialogue state_state
37 nli - natural language inference - language inference - inference - natural language 54 37_nli_natural language inference_language inference_inference
38 act - dialogue act - dialogue - dialog act - dialog 51 38_act_dialogue act_dialogue_dialog act
39 commonsense - reasoning - commonsense reasoning - knowledge - commonsense knowledge 49 39_commonsense_reasoning_commonsense reasoning_knowledge
40 crosslingual - multilingual - transfer - crosslingual transfer - mbert 49 40_crosslingual_multilingual_transfer_crosslingual transfer
41 coreference - resolution - coreference resolution - mention - pronoun 49 41_coreference_resolution_coreference resolution_mention
42 legal - patent - court - case - legal domain 48 42_legal_patent_court_case
43 dialect - identification - language identification - dialect identification - arabic 47 43_dialect_identification_language identification_dialect identification
44 amr - amr parsing - parsing - meaning representation - meaning 46 44_amr_amr parsing_parsing_meaning representation
45 adversarial - adversarial examples - attacks - attack - examples 46 45_adversarial_adversarial examples_attacks_attack
46 health - mental - mental health - social media - media 45 46_health_mental_mental health_social media
47 offensive - offensive language - subtask - offensive language identification - hostile 45 47_offensive_offensive language_subtask_offensive language identification
48 semantic parsing - parsing - semantic - compositional generalization - logical 44 48_semantic parsing_parsing_semantic_compositional generalization
49 recurrent - language modeling - rnn - lstm - modeling 44 49_recurrent_language modeling_rnn_lstm
50 sql - texttosql - database - queries - query 44 50_sql_texttosql_database_queries
51 indian - smt - translation - machine translation - machine 43 51_indian_smt_translation_machine translation
52 style - style transfer - transfer - text style - text style transfer 43 52_style_style transfer_transfer_text style
53 poetry - poems - lyrics - music - verse 43 53_poetry_poems_lyrics_music
54 codeswitching - cs - codeswitched - codemixed - monolingual 43 54_codeswitching_cs_codeswitched_codemixed
55 sentiment - polarity - sentiment analysis - analysis - prior polarity 41 55_sentiment_polarity_sentiment analysis_analysis
56 sarcasm - sarcasm detection - sarcastic - detection - irony 41 56_sarcasm_sarcasm detection_sarcastic_detection
57 gec - grammatical error - grammatical error correction - error correction - correction 40 57_gec_grammatical error_grammatical error correction_error correction
58 intent - intent detection - slot - slot filling - filling 40 58_intent_intent detection_slot_slot filling
59 temporal - events - temporal relations - expressions - temporal relation 39 59_temporal_events_temporal relations_expressions
60 adaptation - domain - domain adaptation - indomain - translation 37 60_adaptation_domain_domain adaptation_indomain
61 stance - stance detection - detection - tweets - veracity 37 61_stance_stance detection_detection_tweets
62 codemixed - sentiment - sentiment analysis - analysis - semeval2020 36 62_codemixed_sentiment_sentiment analysis_analysis
63 keyphrase - keyphrases - keyphrase extraction - keyphrase generation - extraction 35 63_keyphrase_keyphrases_keyphrase extraction_keyphrase generation
64 nmt - subword - translation - vocabulary - neural machine translation 35 64_nmt_subword_translation_vocabulary
65 calculus - logic - semantics - proof - typelogical 35 65_calculus_logic_semantics_proof
66 simplification - text simplification - sentence simplification - sentence - ts 35 66_simplification_text simplification_sentence simplification_sentence
67 annotation - xml - formats - tei - standards 35 67_annotation_xml_formats_tei
68 correction - spelling - ocr - spelling correction - errors 33 68_correction_spelling_ocr_spelling correction
69 sentiment - sentiment classification - sentiment analysis - classification - analysis 33 69_sentiment_sentiment classification_sentiment analysis_classification
70 complexity - readability - lexical complexity - assessment - readability assessment 31 70_complexity_readability_lexical complexity_assessment
71 postediting - ape - automatic postediting - mt - translation 30 71_postediting_ape_automatic postediting_mt
72 gender - gender bias - bias - translation - pronouns 30 72_gender_gender bias_bias_translation
73 tagger - tagging - taggers - pos - partofspeech 30 73_tagger_tagging_taggers_pos
74 meeting - summarization - podcast - abstractive - summaries 30 74_meeting_summarization_podcast_abstractive
75 domain - domain adaptation - adaptation - domains - target domain 30 75_domain_domain adaptation_adaptation_domains
76 documentlevel - context - translation - nmt - neural machine 29 76_documentlevel_context_translation_nmt
77 text classification - classification - convolutional - networks - convolutional neural 29 77_text classification_classification_convolutional_networks
78 news - fake - fake news - clickbait - satirical 29 78_news_fake_fake news_clickbait
79 grammars - grammar - stochastic - contextfree - contextfree grammars 29 79_grammars_grammar_stochastic_contextfree
80 ontology - rogets - thesaurus - wordnet - concepts 29 80_ontology_rogets_thesaurus_wordnet
81 vietnamese - ner - named entity recognition - entity recognition - named entity 28 81_vietnamese_ner_named entity recognition_entity recognition
82 claim - verification - evidence - claims - fever 27 82_claim_verification_evidence_claims
83 metrics - nlg - language generation - evaluation - natural language generation 27 83_metrics_nlg_language generation_evaluation
84 responses - response - response generation - adversarial - generation 27 84_responses_response_response generation_adversarial
85 robustness - nmt - translation - neural machine - neural machine translation 27 85_robustness_nmt_translation_neural machine
86 revision - editing - seq2seq - revisions - rewriting 27 86_revision_editing_seq2seq_revisions
87 phonological - phonology - finitestate - reduplication - prosody 26 87_phonological_phonology_finitestate_reduplication
88 geolocation - location - geographic - twitter - names 26 88_geolocation_location_geographic_twitter
89 event - event extraction - extraction - event types - argument 26 89_event_event extraction_extraction_event types
90 mt - human - translation - evaluation - parity 25 90_mt_human_translation_evaluation
91 arabic - sentiment - sentiment analysis - arabic sentiment - arabic sentiment analysis 25 91_arabic_sentiment_sentiment analysis_arabic sentiment
92 emoji - emojis - emoji prediction - emoticons - sentiment 25 92_emoji_emojis_emoji prediction_emoticons
93 constituency - latent tree - parsing - constituency parsing - tree learning 25 93_constituency_latent tree_parsing_constituency parsing
94 spatial - instructions - 3d - environment - robot 24 94_spatial_instructions_3d_environment
95 persona - responses - personality - traits - consistency 23 95_persona_responses_personality_traits
96 matching - response - retrievalbased - chatbots - multiturn 23 96_matching_response_retrievalbased_chatbots
97 entity - entity typing - typing - finegrained entity - type 22 97_entity_entity typing_typing_finegrained entity
98 math - word problems - math word - word problem - problems 21 98_math_word problems_math word_word problem
99 bert - multilingual - multilingual bert - bert model - multilingual models 21 99_bert_multilingual_multilingual bert_bert model
100 financial - stock - market - news - price 21 100_financial_stock_market_news
101 video - multimodal - sceneaware - dialog - visual 21 101_video_multimodal_sceneaware_dialog
102 sense - multisense - senses - word sense - word 21 102_sense_multisense_senses_word sense
103 game - games - agents - communication - pragmatic 21 103_game_games_agents_communication
104 graph - amrtotext - amrtotext generation - amr - graphs 20 104_graph_amrtotext_amrtotext generation_amr
105 nmt - translation - neural machine translation - neural machine - machine translation 20 105_nmt_translation_neural machine translation_neural machine
106 normalization - text normalization - normalizing - text - historical 20 106_normalization_text normalization_normalizing_text
107 privacy - policies - anonymization - deidentification - vague 20 107_privacy_policies_anonymization_deidentification
108 beam - beam search - search - decoding - constraints 20 108_beam_beam search_search_decoding
109 hypernymy - distributional - pathbased - hypernymy detection - hypernyms 19 109_hypernymy_distributional_pathbased_hypernymy detection
110 political - bias - articles - news - ideology 19 110_political_bias_articles_news
111 generative adversarial - gans - gan - generative - generative adversarial networks 18 111_generative adversarial_gans_gan_generative
112 pos - tagger - tagging - pos tagging - codemixed 17 112_pos_tagger_tagging_pos tagging
113 humor - humorous - headlines - funny - puns 17 113_humor_humorous_headlines_funny
114 metaphor - metaphors - metaphoric - metaphorical - literal 17 114_metaphor_metaphors_metaphoric_metaphorical
115 codeswitching - cs - asr - speech - speech recognition 17 115_codeswitching_cs_asr_speech
116 event coreference - event - coreference - coreference resolution - resolution 17 116_event coreference_event_coreference_coreference resolution
117 reviews - review - helpfulness - opinion - online reviews 17 117_reviews_review_helpfulness_opinion
118 covid19 - tweets - wnut2020 - twitter - informative 17 118_covid19_tweets_wnut2020_twitter
119 anaphora - resolution - pronouns - pronoun - anaphora resolution 17 119_anaphora_resolution_pronouns_pronoun
120 bilingual - dictionary - comparability - termhood - comparable corpora 17 120_bilingual_dictionary_comparability_termhood
121 discourse - translation - pronouns - dp - discourse phenomena 17 121_discourse_translation_pronouns_dp
122 color - colour - naming - colors - character embeddings 16 122_color_colour_naming_colors
123 nonautoregressive - autoregressive - nat - nonautoregressive neural - decoding 16 123_nonautoregressive_autoregressive_nat_nonautoregressive neural
124 nlg - natural language generation - language generation - spoken dialogue - generation 16 124_nlg_natural language generation_language generation_spoken dialogue
125 crowdsourcing - workers - examples - protocols - data collection 16 125_crowdsourcing_workers_examples_protocols
126 african - revolution - african languages - technology - african language 16 126_african_revolution_african languages_technology
127 grading - scoring - essay - short answer - essay scoring 16 127_grading_scoring_essay_short answer
128 treebanks - treebank - parsing - crosslingual - dependency 16 128_treebanks_treebank_parsing_crosslingual
129 reviews - summarization - review - product - summaries 16 129_reviews_summarization_review_product
130 gaze - reading - eyetracking - eye - behaviour 16 130_gaze_reading_eyetracking_eye
131 nlp - natural - natural language - nlg - language 15 131_nlp_natural_natural language_nlg
132 news translation - news translation task - translation task - news - submission 14 132_news translation_news translation task_translation task_news
133 eat - meaning - semantics - formal - theory 14 133_eat_meaning_semantics_formal
134 sign - sign language - sl - asl - deaf 14 134_sign_sign language_sl_asl
135 multitask - labels - mtl - sequence - multitask learning 14 135_multitask_labels_mtl_sequence
136 phylogenetic - cognate - indoeuropean - historical linguistics - indoeuropean language 14 136_phylogenetic_cognate_indoeuropean_historical linguistics
137 syntax - translation - neural machine translation - neural machine - nmt 14 137_syntax_translation_neural machine translation_neural machine
138 explanations - explanation - explainers - nl explanations - faithful 14 138_explanations_explanation_explainers_nl explanations
139 slot - slot filling - filling - slots - nlu 13 139_slot_slot filling_filling_slots
140 personality - traits - profiling - author profiling - author 13 140_personality_traits_profiling_author profiling
141 preposition - prepositions - supersenses - prepositional - supersense 13 141_preposition_prepositions_supersenses_prepositional
142 scientific - application areas - application - areas - literature 13 142_scientific_application areas_application_areas
143 russian - similarity - semantic similarity - similarity task - semantic similarity task 13 143_russian_similarity_semantic similarity_similarity task
144 code - source code - documentation - code generation - programming 13 144_code_source code_documentation_code generation
145 semantic web - translation - machinetranslation - machine translation - technologies 12 145_semantic web_translation_machinetranslation_machine translation
146 knowledge - knowledgegrounded - response - dialogue generation - dialogue 12 146_knowledge_knowledgegrounded_response_dialogue generation
147 sentence - sentence representations - sentence embeddings - transfer - tasks 12 147_sentence_sentence representations_sentence embeddings_transfer
148 distributional - distributional semantics - semantics - functional distributional - functional distributional semantics 12 148_distributional_distributional semantics_semantics_functional distributional
149 compositionality - sc - distributional - sememe knowledge - phrase 12 149_compositionality_sc_distributional_sememe knowledge
150 ud - annotation - treebank - treebanks - universal dependencies 12 150_ud_annotation_treebank_treebanks
151 acronym - abbreviation - acronyms - abbreviations - disambiguation 12 151_acronym_abbreviation_acronyms_abbreviations
152 propaganda - task 11 - 11 - propaganda detection - semeval2020 task 12 152_propaganda_task 11_11_propaganda detection
153 open - open information extraction - open information - information extraction - tuples 12 153_open_open information extraction_open information_information extraction
154 hebrew - bible - intertextuality - restoration - homographs 11 154_hebrew_bible_intertextuality_restoration
155 typological - typology - typological features - languages - linguistic typology 11 155_typological_typology_typological features_languages
156 label - text classification - multilabel - labels - classification 11 156_label_text classification_multilabel_labels
157 variational - latent - variational autoencoders - variational autoencoder - autoencoders 11 157_variational_latent_variational autoencoders_variational autoencoder
158 crisis - messages - disasters - disaster - emergency 11 158_crisis_messages_disasters_disaster
159 adversarial - rc - rc models - robustness - comprehension 11 159_adversarial_rc_rc models_robustness
160 tree - treelstm - trees - tree structures - syntactic 11 160_tree_treelstm_trees_tree structures
161 headline - headlines - news - headline generation - synthetic news 11 161_headline_headlines_news_headline generation
162 reasoning - kg - paths - kgs - multihop 11 162_reasoning_kg_paths_kgs
163 text classification - classification - runtime - fasttext - text 10 163_text classification_classification_runtime_fasttext

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.25.2
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.6
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.6.1
  • Transformers: 4.38.2
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
2
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.