topic_modelling / funcs /topic_core_funcs.py

Commit History

Minor cleaning, csv formatting changes
d80c8f5

Sean-Case commited on

Reduce outliers now more efficient and relabels with correct vectoriser. Default topic labels now tidier. Hiearchical topics outputs more useful for joining to df afterwards. Switched low resource reduction algorithm to UMAP as default is not good.
e1c1f68

Sonnyjim commited on

Should now parse custom regex correctly. Will now wipe previously created embeddings if 'low resource mode' option switched.
0a543a0

Sean-Case commited on

Allowed for uploading custom regex for cleaning. Fixed calculate all probabilities, reduce outliers. Added text tree for hierarchical modelling.
381f959

Sonnyjim commited on

LLM model save is failing in Huggingface - attempting instead to save to base folder
c2bf185

Sean-Case commited on

Some text changes. Fixed a couple of TF-IDF embeddings issues
87306c7

Sean-Case commited on

Added clean data options, improved re-representation options and visualisation. General format changes
4effac0

Sonnyjim commited on