Edit model card

BERTopic-enron-5000

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("antulik/BERTopic-enron-5000")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 65
  • Number of training documents: 5000
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 enron - corp - contract - company - trading 10 -1_enron_corp_contract_company
0 going - meeting - meet - hope - night 2299 0_going_meeting_meet_hope
1 agreements - enron - agreement - contract - documents 481 1_agreements_enron_agreement_contract
2 enron - enrons - companies - company - market 263 2_enron_enrons_companies_company
3 enron - contact - corp - email - recipient 253 3_enron_contact_corp_email
4 telecom - ventures - financial - companies - markets 84 4_telecom_ventures_financial_companies
5 enron - email - recipient - recipients - message 76 5_enron_email_recipient_recipients
6 fares - newark - airlines - flight - miles 58 6_fares_newark_airlines_flight
7 nfl - commissionercom - td - sportslinecom - league 54 7_nfl_commissionercom_td_sportslinecom
8 enron - eov - ashleyworthingenroncom - erv - rho 53 8_enron_eov_ashleyworthingenroncom_erv
9 enron - enrons - bankruptcy - bankrupt - savings 51 9_enron_enrons_bankruptcy_bankrupt
10 outlookmigrationteamenroncom - outlook - outlookteamenroncom - emailcalendar - appointment 46 10_outlookmigrationteamenroncom_outlook_outlookteamenroncom_emailcalendar
11 enron - approver - approval - pending - econnect 46 11_enron_approver_approval_pending
12 schedules2002013118txt - schedules2002020115txt - schedules2002012506txt - schedules2001122507txt - schedules2001122815txt 45 12_schedules2002013118txt_schedules2002020115txt_schedules2002012506txt_schedules2001122507txt
13 pricing - lpg - logistics - freight - metered 44 13_pricing_lpg_logistics_freight
14 request - seeks - up - on - all 43 14_request_seeks_up_on
15 haas - semester - summers - faculty - mba 43 15_haas_semester_summers_faculty
16 federal - california - sacramento - californias - states 42 16_federal_california_sacramento_californias
17 enron - resumes - resume - interview - recruiter 41 17_enron_resumes_resume_interview
18 fontstyle - font - html - bold - sansserif 39 18_fontstyle_font_html_bold
19 enron - deals - trades - deal - tradesxls 37 19_enron_deals_trades_deal
20 pipeline - pipelines - piping - paso - pipe 36 20_pipeline_pipelines_piping_paso
21 enron - eb - contact - mailtobobshultsenroncom - emailed 36 21_enron_eb_contact_mailtobobshultsenroncom
22 outage - outagesindustrialinfocom - outages - rescheduled - scheduled 36 22_outage_outagesindustrialinfocom_outages_rescheduled
23 gifts - gift - holiday - holidays - christmas 36 23_gifts_gift_holiday_holidays
24 nymex - futures - expiration - contract - contracts 31 24_nymex_futures_expiration_contract
25 transmission - transco - translink - ferc - rtos 30 25_transmission_transco_translink_ferc
26 unsubscribe - email - newsletter - mailing - mailmanenroncom 30 26_unsubscribe_email_newsletter_mailing
27 invoices - invoice - enron - billed - reimbursement 29 27_invoices_invoice_enron_billed
28 enron - committee - lobbyist - judiciary - bill 28 28_enron_committee_lobbyist_judiciary
29 refinery - prices - pipeline - oil - price 27 29_refinery_prices_pipeline_oil
30 enron - gas - fuel - logistics - emissions 27 30_enron_gas_fuel_logistics
31 enron - dpc - topockpcb - ebizenroncom - pcb 24 31_enron_dpc_topockpcb_ebizenroncom
32 nyisotechexchange - nyisotechexchangeglobal2000net - marketrelationsnyisocom - nyiso - ownernyisotechexchangeliststhebiznet 24 32_nyisotechexchange_nyisotechexchangeglobal2000net_marketrelationsnyisocom_nyiso
33 expense - expenses - enron - enronupdateconcureworkplacecom - receipts 24 33_expense_expenses_enron_enronupdateconcureworkplacecom
34 enron - ebusiness - inquiries - advisory - contact 23 34_enron_ebusiness_inquiries_advisory
35 dbcaps97data - schedules2002011801txt - schedules2002011805txt - schedules2001102112txt - schedules2002011916txt 21 35_dbcaps97data_schedules2002011801txt_schedules2002011805txt_schedules2001102112txt
36 enrononline - trades - trading - deals - eol 20 36_enrononline_trades_trading_deals
37 enron - swaps - swap - exchange - exchanges 20 37_enron_swaps_swap_exchange
38 feedback - reviewers - review - process - reviewer 20 38_feedback_reviewers_review_process
39 powermarketerscom - electricity - energy - utilities - reuters 20 39_powermarketerscom_electricity_energy_utilities
40 tco - columbias - columbia - scheduled - cgt 19 40_tco_columbias_columbia_scheduled
41 curves - curve - data - changes - inactive 19 41_curves_curve_data_changes
42 enron - scheduled - eb3335 - rustybelflowerenroncom - brianredmondenroncom 19 42_enron_scheduled_eb3335_rustybelflowerenroncom
43 enron - executive - ceo - communicationsenron - director 18 43_enron_executive_ceo_communicationsenron
44 alert - alerts - ipo - stock - securities 17 44_alert_alerts_ipo_stock
45 invoice - ipayitenroncom - sapsecurityenroncom - ipayit - ehronline 17 45_invoice_ipayitenroncom_sapsecurityenroncom_ipayit
46 variances - variance - schedules - schedule - schedulingiso 17 46_variances_variance_schedules_schedule
47 futures - charts - carr - financial - 1500 17 47_futures_charts_carr_financial
48 approval - approved - authorized - eisb - tariff 16 48_approval_approved_authorized_eisb
49 fee - credit - express - membership - merchant 15 49_fee_credit_express_membership
50 fee - subscription - billing - discount - monthly 15 50_fee_subscription_billing_discount
51 schedules2001102810txt - schedules2001123103txt - schedules2001030406txt - schedules2002010121txt - schedules2001043008txt 14 51_schedules2001102810txt_schedules2001123103txt_schedules2001030406txt_schedules2002010121txt
52 managementcrd - gd - ets - gasdeskenroncom - sst 14 52_managementcrd_gd_ets_gasdeskenroncom
53 shipping - shipment - order - orders - delivery 14 53_shipping_shipment_order_orders
54 dish - satellite - free - channels - dvds 14 54_dish_satellite_free_channels
55 mailbox - outlook - inbox - exchangeadministratorenroncom - folder 13 55_mailbox_outlook_inbox_exchangeadministratorenroncom
56 netware - visualwares - backoffice - newsletter - file 13 56_netware_visualwares_backoffice_newsletter
57 enronfcucom - survey - enronannouncementsenroncom - ews - service 13 57_enronfcucom_survey_enronannouncementsenroncom_ews
58 pira - forecast - piras - demand - weekly 12 58_pira_forecast_piras_demand
59 pricing - enron - cost - rate - price 12 59_pricing_enron_cost_rate
60 whitening - medication - strength - clinical - doctor 11 60_whitening_medication_strength_clinical
61 enron - industries - ebusiness - industrial - ena 11 61_enron_industries_ebusiness_industrial
62 px - credit - pe - sce - tariff 10 62_px_credit_pe_sce
63 enron - eesi - eemc - assets - nepco 10 63_enron_eesi_eemc_assets

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: [['drug', 'cancer', 'drugs', 'doctor'], ['windows', 'drive', 'dos', 'file'], ['space', 'launch', 'orbit', 'lunar']]
  • top_n_words: 10
  • verbose: False
  • zeroshot_min_similarity: 0.7
  • zeroshot_topic_list: None

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.6
  • Pandas: 2.0.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.7.0
  • Transformers: 4.40.1
  • Numba: 0.58.1
  • Plotly: 5.15.0
  • Python: 3.10.12
Downloads last month
9