Edit model card

BERTopic_UKR_CH

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("DssLlmSubmission/BERTopic_UKR_CH")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 267 #Please note that after training, we manually assessed all clusters and merged similar ones leading to a total of 17 distinct clusters.
  • Number of training documents: 550677
Click here for an overview of all topics. The following Python code uses a dictionary to map the 267 clusters found by algorithm to the 17 distinct clusters we identified by qualitative analysis.
topic_mapping = {-1: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 0: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Information Requests'}, 1: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 2: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Insurance'}, 3: {'cluster_id': 2, 'cluster_name': 'Pet', 'sub_cluster': 'Pet'}, 4: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Asylum'}, 5: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Ticket Inquiries'}, 6: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Carriers, Transport to and from Ukraine'}, 7: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Seeking'}, 8: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 9: {'cluster_id': 5, 'cluster_name': 'Volunteering', 'sub_cluster': 'Volunteering'}, 10: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Communication'}, 11: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Translation Services'}, 12: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Passport'}, 13: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Dentistry'}, 14: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Job'}, 15: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Currency'}, 16: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Banking'}, 17: {'cluster_id': 8, 'cluster_name': 'Social Services', 'sub_cluster': 'Protocols'}, 18: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Mail'}, 19: {'cluster_id': 9, 'cluster_name': 'Education', 'sub_cluster': 'Education'}, 20: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Clothing'}, 21: {'cluster_id': 8, 'cluster_name': 'Social Services', 'sub_cluster': 'Financial Assistance'}, 22: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 23: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 24: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Carriers, Transport to and from Ukraine'}, 25: {'cluster_id': 9, 'cluster_name': 'Education', 'sub_cluster': 'Education'}, 26: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 27: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Public Transportation'}, 28: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Leasing Regulation'}, 29: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 30: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Open Chat'}, 31: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Communication'}, 32: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 33: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Regulation'}, 34: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Food'}, 35: {'cluster_id': 2, 'cluster_name': 'Pet', 'sub_cluster': 'Pet'}, 36: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Carriers, Transport to and from Ukraine'}, 37: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Vehicle'}, 38: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 39: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 40: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Request'}, 41: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 42: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Status Acquisition'}, 43: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Consulate Services'}, 44: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 45: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Public Transportation'}, 46: {'cluster_id': 5, 'cluster_name': 'Volunteering', 'sub_cluster': 'Volunteering'}, 47: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 48: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Vehicle'}, 49: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Seeking'}, 50: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Immigration Procedure'}, 51: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'War Chat'}, 52: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 53: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Network Provider'}, 54: {'cluster_id': 9, 'cluster_name': 'Education', 'sub_cluster': 'Education'}, 55: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 56: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 57: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Asylum'}, 58: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 59: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Leisure and Fitness'}, 60: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Tax'}, 61: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Expense'}, 62: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Seeking'}, 63: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 64: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 65: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Carriers, Transport to and from Ukraine'}, 66: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 67: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 68: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Public Transportation'}, 69: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Family Reunion'}, 70: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 71: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Request'}, 72: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 73: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Public Transportation'}, 74: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 75: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Vaccinations'}, 76: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Police'}, 77: {'cluster_id': 8, 'cluster_name': 'Social Services', 'sub_cluster': 'Financial Assistance'}, 78: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Regulation'}, 79: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Carriers, Transport to and from Ukraine'}, 80: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 81: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Request'}, 82: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Public Transportation'}, 83: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Parking'}, 84: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 85: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Travel'}, 86: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 87: {'cluster_id': 11, 'cluster_name': 'Legal information', 'sub_cluster': 'Legal information'}, 88: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 89: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 90: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Insurance'}, 91: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Network Provider'}, 92: {'cluster_id': 9, 'cluster_name': 'Education', 'sub_cluster': 'Education'}, 93: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Seeking'}, 94: {'cluster_id': 12, 'cluster_name': 'Religious Information', 'sub_cluster': 'Religious Information'}, 95: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Network Provider'}, 96: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 97: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 98: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 99: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 100: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Banking'}, 101: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 102: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 103: {'cluster_id': 8, 'cluster_name': 'Social Services', 'sub_cluster': 'Library'}, 104: {'cluster_id': 8, 'cluster_name': 'Social Services', 'sub_cluster': 'Library'}, 105: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Tax'}, 106: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Police'}, 107: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Travel'}, 108: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 109: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Network Provider'}, 110: {'cluster_id': 11, 'cluster_name': 'Legal information', 'sub_cluster': 'Legal information'}, 111: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Passport'}, 112: {'cluster_id': 9, 'cluster_name': 'Education', 'sub_cluster': 'Education'}, 113: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Regulation'}, 114: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Immigration Procedure'}, 115: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 116: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 117: {'cluster_id': 9, 'cluster_name': 'Education', 'sub_cluster': 'Education'}, 118: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Job'}, 119: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 120: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 121: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Seeking'}, 122: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Translation Services'}, 123: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Insurance'}, 124: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Seeking'}, 125: {'cluster_id': 11, 'cluster_name': 'Legal information', 'sub_cluster': 'Legal information'}, 126: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 127: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 128: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 129: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Psychotherapy'}, 130: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 131: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 132: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Seeking'}, 133: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Immigration Procedure'}, 134: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 135: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Home Appliances'}, 136: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 137: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 138: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 139: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Tax'}, 140: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Vaccinations'}, 141: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Travel'}, 142: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Request'}, 143: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Shopping'}, 144: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 145: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 146: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 147: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Job'}, 148: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Vehicle'}, 149: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 150: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 151: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 152: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Request'}, 153: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 154: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 155: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 156: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 157: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 158: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Leisure and Fitness'}, 159: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 160: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Communication'}, 161: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Leisure and Fitness'}, 162: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 163: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 164: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Travel'}, 165: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Shopping'}, 166: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 167: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Public Transportation'}, 168: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Communication'}, 169: {'cluster_id': 12, 'cluster_name': 'Religious Information', 'sub_cluster': 'Religious Information'}, 170: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Shopping'}, 171: {'cluster_id': 3, 'cluster_name': 'Transportation', 'sub_cluster': 'Taxi Services'}, 172: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Travel'}, 173: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Travel'}, 174: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 175: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Open Chat'}, 176: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 177: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 178: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Travel'}, 179: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Immigration Procedure'}, 180: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 181: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 182: {'cluster_id': 11, 'cluster_name': 'Legal information', 'sub_cluster': 'Divorce'}, 183: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 184: {'cluster_id': 8, 'cluster_name': 'Social Services', 'sub_cluster': 'Protocols'}, 185: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Shopping'}, 186: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 187: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Immigration Procedure'}, 188: {'cluster_id': 11, 'cluster_name': 'Legal information', 'sub_cluster': 'Marriage'}, 189: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Job'}, 190: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 191: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 192: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Leisure and Fitness'}, 193: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Travel'}, 194: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 195: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 196: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 197: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Leisure and Fitness'}, 198: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Shopping'}, 199: {'cluster_id': 5, 'cluster_name': 'Volunteering', 'sub_cluster': 'Volunteering'}, 200: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Job'}, 201: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Logistics'}, 202: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 203: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Consulate Services'}, 204: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Seeking'}, 205: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Other Item Request'}, 206: {'cluster_id': 4, 'cluster_name': 'Accommodation', 'sub_cluster': 'Leasing Regulation'}, 207: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Other Item Request'}, 208: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Job'}, 209: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 210: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 211: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 212: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 213: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Infant & Toddler Care'}, 214: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 215: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 216: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 217: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Regulation'}, 218: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 219: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Request'}, 220: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 221: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 222: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Hospice Care'}, 223: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 224: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 225: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Request'}, 226: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 227: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Dentistry'}, 228: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 229: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 230: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Customs'}, 231: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 232: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 233: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Customs'}, 234: {'cluster_id': 6, 'cluster_name': 'Integration', 'sub_cluster': 'Customs'}, 235: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Disability'}, 236: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 237: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 238: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 239: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 240: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Regulation'}, 241: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Network Provider'}, 242: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 243: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 244: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 245: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Leisure and Fitness'}, 246: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 247: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Travel'}, 248: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 249: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Leisure and Fitness'}, 250: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Immigration Procedure'}, 251: {'cluster_id': 10, 'cluster_name': 'Social Activity', 'sub_cluster': 'Regulation'}, 252: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 253: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Open Chat'}, 254: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 255: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Medical Request'}, 256: {'cluster_id': 0, 'cluster_name': 'Immigration', 'sub_cluster': 'Immigration Procedure'}, 257: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 258: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 259: {'cluster_id': 8, 'cluster_name': 'Social Services', 'sub_cluster': 'Protocols'}, 260: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 261: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 262: {'cluster_id': -1, 'cluster_name': 'Unknown', 'sub_cluster': 'Unknown'}, 263: {'cluster_id': 1, 'cluster_name': 'Healthcare and Insurance', 'sub_cluster': 'Infant & Toddler Care'}, 264: {'cluster_id': 7, 'cluster_name': 'Living Essentials', 'sub_cluster': 'Shopping'}, 265: {'cluster_id': 5, 'cluster_name': 'Volunteering', 'sub_cluster': 'Volunteering'}}
    df['cluster_id_fit'] = df['predicted_class_old'].map(lambda x: topic_mapping[x]['cluster_id'])
    df['predicted_class'] = df['predicted_class_old'].map(lambda x: topic_mapping[x]['cluster_name'])
    df['sub_cluster'] = df['predicted_class_old'].map(lambda x: topic_mapping[x]['sub_cluster'])

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: auto
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.24.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.4
  • Pandas: 2.0.3
  • Scikit-Learn: 1.3.1
  • Sentence-transformers: 2.2.2
  • Transformers: 4.34.0
  • Numba: 0.58.0
  • Plotly: 5.17.0
  • Python: 3.8.10
Downloads last month
3