File size: 16,534 Bytes
a4d0823
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224

---
tags:
- bertopic
library_name: bertopic
pipeline_tag: text-classification
---

# hub_issues_topocs

This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. 
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. 

## Usage 

To use this model, please install BERTopic:

```
pip install -U bertopic
```

You can use the model as follows:

```python
from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/hub_issues_topocs")

topic_model.get_topic_info()
```

## Topic overview

* Number of topics: 156
* Number of training documents: 6427

<details>
  <summary>Click here for an overview of all topics.</summary>
  
  | Topic ID | Topic Keywords | Topic Frequency | Label | 
|----------|----------------|-----------------|-------| 
| -1 | model - version - training - add - base | 10 | Outlier Topic | 
| 0 | yes - upscaling - embeddings - dir - 18 | 1785 | Yes Upscaling VAE Embeddings | 
| 1 | images - image - img2img - generated - black | 218 | Image Distortion Investigation | 
| 2 | languages - language - chinese - support - multilingual | 169 | Multilingual Language Support | 
| 3 | request - thesis - checker - request request - work | 103 | DOI request and thesis checker | 
| 4 | bloom - 176b - bloomz - bert - 7b1 | 95 | Bloom inference on BERT | 
| 5 | api - inference api - hosted - inference - hosted inference | 80 | Configuring Inference API | 
| 6 | report report - report - reports - look - awesome | 78 | Awesome Reports | 
| 7 | use model - run model - model run - model use - tune model | 73 | Use model instructions | 
| 8 | request access - access request - access - request - request requesting | 65 | Access Request Solution | 
| 9 | colab - google - google colab - model google - collab | 64 | "Running Galactica on Colab" | 
| 10 | json - config json - config - json file - file named | 62 | JSON configuration files | 
| 11 | load model - load - model working - unable load - unable | 60 | "Model loading issues" | 
| 12 | text - text generation - words - truncated - generation | 57 | Text Generation Techniques | 
| 13 | label - labels - tags - classifier - entity | 57 | Document Labels | 
| 14 | data - model dataset - dataset - train model - used train | 55 | Model Training Data | 
| 15 | issue report - issue - report - 论文 - artists | 55 | Ethical Issues in Artists' Legal Discussion | 
| 16 | loading - loading model - error loading - model error - load model | 55 | Model Loading Errors | 
| 17 | error error - error - 500 error - connection - unknown error | 49 | Error 500 Connection | 
| 18 | train model - train - trained - model did - model trained | 46 | Training models in Arabic | 
| 19 | stable diffusion - diffusion - stable - diffusion v1 - diffusion webui | 46 | Stable Diffusion Downloads | 
| 20 | question - answers - questions - tts - double | 45 | Question about Fig.2c | 
| 21 | length - max - maximum - limit - sequence length | 45 | Length Limits and Token Length | 
| 22 | model model - model architecture - generator - architecture - type | 42 | Model Architecture | 
| 23 | commercial - license - commercial use - license license - mit | 41 | Commercial Use License | 
| 24 | transformers - transformer - sentence transformers - sentence - using transformers | 40 | Issues with sentence transformers | 
| 25 | huggingface - hugging face - hugging - face - using hugging | 40 | Hugging Face model usage | 
| 26 | legal - legal issue - issue report - issue - report | 40 | Legal Issues Reports | 
| 27 | v2 - v3 - anime - wav2vec2 - virus | 40 | Anime Virus Detection Vae | 
| 28 | tutorials - thread - tricks - 26 - tips | 39 | Stable Diffusion 26+ Tutorials | 
| 29 | difference - fp16 - dpm - opus - opus mt | 39 | Difference between phase1 and phase2 | 
| 30 | tokenizer - using from_pretrained - loading - error loading - load | 37 | Tokenizer Loading Error | 
| 31 | output - extraction - truncated - summaries - outputs | 37 | Output Extraction | 
| 32 | attribute - object - attributeerror - typeerror - string | 36 | AttributeError in object attributes | 
| 33 | ckpt file - ckpt - file ckpt - file - ckpt files | 36 | CKPT file location | 
| 34 | dataset dataset - dataset - source dataset - datasets - source | 36 | dataset source semantic search | 
| 35 | size - mismatch - discrepancy - vocab size - dimensionality | 36 | Size Mismatch Discrepancy | 
| 36 | license - license license - permission - agreement - licence | 36 | License Agreement | 
| 37 | model card - card - card model - building model - building | 35 | Model Card Typos | 
| 38 | demo - space - spaces - gradio - cause | 35 | Troubleshooting Gradio Demo | 
| 39 | commercially - does model - commercial - model used - usable | 34 | Commercial Usability of AI Model | 
| 40 | automatic1111 - webui - automatic - ui - web ui | 33 | Automatic1111 WebUI | 
| 41 | import - transformers - module - failed - export | 33 | ImportError in Transformers Module | 
| 42 | example - examples - example use - prompt example - usage example | 33 | Example Usage | 
| 43 | audio - noise - spectrogram - second - speaker | 33 | Audio Transcription and Conversion | 
| 44 | cool - love - idea - amazing - great | 32 | "cool and amazing" | 
| 45 | language model - language - kenlm - lm - multilingual | 32 | Language Model Inference with KenLM | 
| 46 | really - nice - cool - love - amazing | 32 | amazing model | 
| 47 | sagemaker - endpoint - deployment - deploy - amazon | 32 | Deploying SageMaker Endpoints | 
| 48 | training training - training - training steps - general - video | 31 | "Training Steps Video" | 
| 49 | tokenizer - problems - masked - tokenizer tokenizer - tokens | 31 | Tokenizer Problems | 
| 50 | sd - sd2 - sd sd - does support - wd | 30 | Using SD with Different Versions | 
| 51 | test - testing - sampler - discussion - split | 30 | Testing Sampler Discussion | 
| 52 | argument - unexpected - keyword - typeerror - got | 30 | Unexpected keyword argument TypeError | 
| 53 | float - runtimeerror expected - runtimeerror - expected - type | 30 | RuntimeErrors with Float and Half Types | 
| 54 | dataset used - dataset - dataset dataset - used fine - used | 28 | Dataset Usage | 
| 55 | json - json file - model architecture - inconsistency - architecture | 28 | JSON file inconsistency | 
| 56 | usage - project - app - macos - usage questions | 28 | Usage with Sherpa | 
| 57 | reproduce - results - result - civitai - reproducing results | 28 | Reproduce Result Difficulty | 
| 58 | gene - cell - question generation - generation - geneformer | 27 | Gene Embedding Generation | 
| 59 | gpu - gpus - multiple - gpu run - model multiple | 27 | Multi-GPU Model Execution | 
| 60 | tokenizer use - wlop - mean - token - webui version | 26 | Tokenizer for Cantonese | 
| 61 | model fine - tuning model - fine tuning - fine - tuning | 26 | Fine-Tuning the Model | 
| 62 | model training - training model - training - redshift - model model | 26 | Model Training | 
| 63 | bot - discord - tesla - chat - character | 26 | Tesla Discord Bot 2021 | 
| 64 | work - doesn work - doesn - dont - does appear | 26 | Non-functional potty lora | 
| 65 | use use - use - best - way use - methods | 26 | Best ways to use | 
| 66 | report card - metadata - card - report -  | 26 | Metadata Report Card | 
| 67 | guide - instructions - guidance - prompt - cost | 25 | Fine-tuning guide instructions | 
| 68 | code - finetuning code - finetuning - fine tuning - tuning | 25 | Fine-tuning Code Sample | 
| 69 | dataset - custom dataset - dataset fine - custom - fine tuning | 25 | Custom dataset fine-tuning | 
| 70 | safetensors - safetensor - version - version safetensors - safetensor version | 25 | SafeTensors Version Inquiry | 
| 71 | model based - task model - model changes - bring - v7 | 25 | Model Description and Changes | 
| 72 | weights - weight - flax - diffusers weights - load weights | 25 | Outdated Flax Weights | 
| 73 | style - modern - mode - new - dark mode | 24 | Style in Modern Technology | 
| 74 | convert - format - trying convert - safetensors - converter | 24 | Safetensors conversion error | 
| 75 | checkpoint - save - checkpoint file - checkpoints - restore | 24 | Checkpoint Safety Restore | 
| 76 | t5 - flan t5 - flan - google flan - xxl | 23 | T5 vs Flan-T5 Differences | 
| 77 | download model - model load - download - load - model download | 23 | "Model Download" | 
| 78 | access access - access - access need - need access - need | 23 | Access Request Assistance | 
| 79 | model details - details model - details - information model - model access | 23 | Model Details | 
| 80 | job - excellent - nice - great - congrats | 23 | Job Well Done | 
| 81 | onnx - conversion - onnx conversion - convert - torchscript | 22 | ONNX Conversion Implementation | 
| 82 | git - repository - repo - cloning - slow | 22 | Git repository cloning issues | 
| 83 | online - 50 - 200 - buy - annotator | 22 | Buy Medications Online | 
| 84 | access - request access - acces request - access request - request | 22 | Access Request | 
| 85 | cuda - cuda memory - memory - cuda error - memory cuda | 22 | CUDA memory out of error | 
| 86 | api model - api - inference api - model api - trying use | 22 | API Model Errors | 
| 87 | training data - data training - data - training dataset - training | 22 | Data Training Examples | 
| 88 | pipeline - valid - pipe - sentence similarity - similarity | 21 | Pipeline error analysis | 
| 89 | tensor - tensors - device - expected - size | 21 | Tensor size mismatch errors | 
| 90 | in_silico_perturber - eos_token_id - switch - 64 - encoder | 21 | Error in decoder generation | 
| 91 | pytorch_model - pytorch_model bin - bin - diffusion_pytorch_model bin - diffusion_pytorch_model | 21 | Missing pytorch_model.bin file | 
| 92 | 404 - url - https - https huggingface - resolve | 21 | 404 error Huggingface documents | 
| 93 | requirements - acess - feature request - request request - feature | 21 | System Requirements Access | 
| 94 | info - technical - details - information - detailed | 21 | Technical Details Inquiry | 
| 95 | hello - hi - good - translates - 100 | 20 | Greetings and Translations | 
| 96 | accuracy - drop - compatibility - precision - half precision | 20 | Accuracy Drop in Precision | 
| 97 | access request - request access - access - request - new | 20 | Access Request | 
| 98 | file missing - log - filenotfounderror - location - sorry | 20 | File Not Found | 
| 99 | model card - card - link model - link - example model | 20 | Broken link in model | 
| 100 | python - kernel - 10 - pytorch - talks | 20 | Python usage and errors | 
| 101 | bug - fix - racist - possible bug - thing | 19 | Bug Fix with Racist Bug | 
| 102 | training code - code training - code - share - share training | 19 | "Training Code Sharing" | 
| 103 | license - accept - license license - model accept - indication | 19 | Model License | 
| 104 | gpt - protgpt2 - 6b - jt - gpt jt | 19 | GPT-JT-6B-v1 Abilities | 
| 105 | report report - report -  -  -  | 19 | Multiple Reports on Topic | 
| 106 | tuning fine - tune fine - fine - fine tuning - tuning | 18 | Fine-tuning for domain adaptation | 
| 107 | inpaint model - inpaint - ix - size model - model pruned | 18 | Inpaint Model | 
| 108 | config file - config - tokenizer config - files config - file | 18 | Config File Troubleshooting | 
| 109 | sample code - example - sample - copied - error example | 18 | Issues with sample code | 
| 110 | nsfw - nsfw content - content - disable - safety | 18 | NSFW Content Filtering | 
| 111 | length - summary - longformer - summary length - text length | 18 | Length of Summaries | 
| 112 | access download - access - download - access access - download working | 18 | Access Download | 
| 113 | thank - thanks - just want - pretty - request thank | 18 | Thank you efforts | 
| 114 | sd v1 - v1 - ema ckpt - sd - ema | 18 | Access to sd-v1-4-full-ema.ckpt | 
| 115 | padding_side - tokens - token - cls token - token id | 18 | Padding and token discrepancy | 
| 116 | amd - vram - gb - gpu - 448 | 17 | "AMD GPU compatibility" | 
| 117 | dataset - pretraining - dataset dataset - datasets - request dataset | 17 | Dataset Pretraining | 
| 118 | version - ggml version - version ggml - ggml - pytorch version | 17 | "Version Possibility" | 
| 119 | memory - leak - a100 - cuda memory - memory google | 17 | Memory-related Issues | 
| 120 | trigger - words - word - trigger word - semantic | 17 | Trigger words and semantic search | 
| 121 | result - results - output - score - ways | 16 | Visualizing Inference Results | 
| 122 | sd - tested - sd sd - lora training - ui | 16 | Stable Diffusion LORA Training | 
| 123 | ckpt file - bin - convert - weights - dreambooth | 16 | Convert Diffusion Diffusers to CKPT | 
| 124 | need help - help - help help - need - started | 16 | Need Help Getting Started | 
| 125 | keyerror - key - exception error - key error - codegen | 16 | KeyError Troubleshooting | 
| 126 | controlnet - control - a1111 - installed - model embedding | 16 | ControlNet not working | 
| 127 | implementation - issue - solved - np - experiencing | 16 | Implementation Issue Fix | 
| 128 | runtimeerror - time series - everytime - process runtimeerror - try run | 16 | Time Series Runtime Error | 
| 129 | use use - use - use readme - use diffusers - tk | 15 | How to use Diffusers | 
| 130 | training dataset - dataset used - used dataset - nli - used training | 15 | Training Dataset Used | 
| 131 | yaml files - colab pc - install run - diffusion google - train custom | 15 | Stable Diffusion Tutorials | 
| 132 | spam - deleted - removed - delete - contact | 15 | Removal of Spam Discussion | 
| 133 | details training - details - training - details details - details info | 14 | Training Details | 
| 134 | hyper parameters - hyper - parameters - provide - provide training | 14 | Hyperparameter Optimization | 
| 135 | fine tune - tune - ner - fine - emotions | 14 | Fine-tune Sentence Embeddings | 
| 136 | model using - using model - examples - question lora - models used | 14 | Inkpunk Diffusion model | 
| 137 | error running - running - running example - usage code - code | 14 | Error running example code | 
| 138 | difference - alpaca - model difference - original model - difference model | 14 | Model Differences | 
| 139 | install - locally - know install - run local - mini | 14 | "How to install locally" | 
| 140 | training script - script - script training - sharing training - midi | 13 | Training Script | 
| 141 | model file - missing model - corrupt - file model - file missing | 13 | Model File Issues | 
| 142 | error help - help error - help - solve - try | 13 | Error Help | 
| 143 | hardware - hardware requirements - requirements - gpu inference - requirements fine | 13 | Hardware Requirements for Inference | 
| 144 | update - updated - channel - expired - new update | 13 | update query status | 
| 145 | negative - negative prompt - negative prompts - prompts - prompt | 13 | "Negative Prompt Function" | 
| 146 | unable run - unable - run unable - run - human | 13 | Unable to run on local machine | 
| 147 | injection - nmkd gui - nmkd - tutorial videos - gui | 12 | Stable Diffusion Tutorial Videos | 
| 148 | download download - download - request acces - know download - fim | 12 | "Download Instructions" | 
| 149 | transformers - sentence transformers - huggingface transformers - different results - usage | 12 | Transformer Usage Discrepancy | 
| 150 | link - broken link - broken - documentation - expired | 11 | Broken links and documentation | 
| 151 | broke - padding - dead - kenlm - dropout | 11 | "Dead KenLM Finetuning" | 
| 152 | training question - question training - training process - question regarding - question | 11 | Training Process Question | 
| 153 | dataset training - training data - training dataset - data training - custom dataset | 11 | Training Data Quality | 
| 154 | download - download download - possible download - hd 18 - hd | 11 | Troubleshooting download errors |
  
</details>

## Training hyperparameters

* calculate_probabilities: False
* language: None
* low_memory: False
* min_topic_size: 10
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: True

## Framework versions

* Numpy: 1.22.4
* HDBSCAN: 0.8.33
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.31.0
* Numba: 0.56.4
* Plotly: 5.13.1
* Python: 3.10.6