NiKo's picture

NiKo

korkakak
Β·

AI & ML interests

None yet

Recent Activity

reacted to asoria's post with ❀️ about 2 months ago
πŸš€ Exploring Topic Modeling with BERTopic πŸ€– When you come across an interesting dataset, you often wonder: Which topics frequently appear in these documents? πŸ€” What is this data really about? πŸ“Š Topic modeling helps answer these questions by identifying recurring themes within a collection of documents. This process enables quick and efficient exploratory data analysis. I’ve been working on an app that leverages BERTopic, a flexible framework designed for topic modeling. Its modularity makes BERTopic powerful, allowing you to switch components with your preferred algorithms. It also supports handling large datasets efficiently by merging models using the BERTopic.merge_models approach. πŸ”— πŸ” How do we make this work? Here’s the stack we’re using: πŸ“‚ Data Source ➑️ Hugging Face datasets with DuckDB for retrieval 🧠 Text Embeddings ➑️ Sentence Transformers (all-MiniLM-L6-v2) ⚑ Dimensionality Reduction ➑️ RAPIDS cuML UMAP for GPU-accelerated performance πŸ” Clustering ➑️ RAPIDS cuML HDBSCAN for fast clustering βœ‚οΈ Tokenization ➑️ CountVectorizer πŸ”§ Representation Tuning ➑️ KeyBERTInspired + Hugging Face Inference Client with Meta-Llama-3-8B-Instruct 🌍 Visualization ➑️ Datamapplot library Check out the space and see how you can quickly generate topics from your dataset: https://huggingface.co/spaces/datasets-topics/topics-generator Powered by @MaartenGr - BERTopic
View all activity

Organizations

Hugging Face 1Bit LLMs's profile picture

korkakak's activity

reacted to fdaudens's post with πŸš€ about 1 month ago
view post
Post
1844
Been reading about the "bigger models = better AI" narrative getting pushed back today.

@thomwolf tackled this head on at Web Summit and highlighted how important small models are (and why closed-source companies haven't pushed for this 😬). They're crushing it: today's 1B parameter models outperform last year's 10B models.

Fascinating to hear him talk about the secret sauce behind this approach.
reacted to asoria's post with ❀️ about 2 months ago
view post
Post
1796
πŸš€ Exploring Topic Modeling with BERTopic πŸ€–

When you come across an interesting dataset, you often wonder:
Which topics frequently appear in these documents? πŸ€”
What is this data really about? πŸ“Š

Topic modeling helps answer these questions by identifying recurring themes within a collection of documents. This process enables quick and efficient exploratory data analysis.

I’ve been working on an app that leverages BERTopic, a flexible framework designed for topic modeling. Its modularity makes BERTopic powerful, allowing you to switch components with your preferred algorithms. It also supports handling large datasets efficiently by merging models using the BERTopic.merge_models approach. πŸ”—

πŸ” How do we make this work?
Here’s the stack we’re using:

πŸ“‚ Data Source ➑️ Hugging Face datasets with DuckDB for retrieval
🧠 Text Embeddings ➑️ Sentence Transformers (all-MiniLM-L6-v2)
⚑ Dimensionality Reduction ➑️ RAPIDS cuML UMAP for GPU-accelerated performance
πŸ” Clustering ➑️ RAPIDS cuML HDBSCAN for fast clustering
βœ‚οΈ Tokenization ➑️ CountVectorizer
πŸ”§ Representation Tuning ➑️ KeyBERTInspired + Hugging Face Inference Client with Meta-Llama-3-8B-Instruct
🌍 Visualization ➑️ Datamapplot library
Check out the space and see how you can quickly generate topics from your dataset: datasets-topics/topics-generator

Powered by @MaartenGr - BERTopic