Spaces:
Running
Running
Categorization Module 🏷️
Responsibility
This module handles automatic categorization of notes.
Functionality
- Receive summary text.
- Use Google Gemini to analyze content.
- Return a single category (e.g., Programming, Medicine, History).
Files
1. categorizer.py
- Purpose: Categorize text using AI.
- Main Class:
CategorizationService - Key Method:
categorize_text(text)- Returns category name.
How It Works
- Receive Text: Take first 2000 characters from summary.
- Send Prompt: Ask Gemini to determine one or two-word category.
- Clean Result: Remove periods and capitalize first letter.
- Validate: If result is too long (>30 chars), truncate it.
Category Examples
- Programming - Coding and development tutorials.
- Medicine - Health and medical content.
- Business - Business management and entrepreneurship.
- Science - Physics, chemistry, biology.
- History - Historical events and civilizations.
- Personal Development - Self-improvement content.
- Uncategorized - If categorization fails.
Proposed Enhancements
- Add predefined list of allowed categories.
- Use embeddings to improve categorization accuracy.
- Add support for sub-categories.
- Store categorization results in database for future analysis.
Testing
from src.ai_modules.categorization.categorizer import CategorizationService
categorizer = CategorizationService()
# Categorize text
text = "This video explains how to build a REST API using FastAPI and Python..."
category = await categorizer.categorize_text(text)
print(f"Category: {category}") # Output: Programming
Libraries Used
google-genai- Communicate with Google Gemini.
Important Notes
- Currently using
gemini-1.5-flashmodel. - If text is too short (<10 chars), returns "Uncategorized".
- Accuracy can be improved by adding examples in the prompt.
Improving the Prompt
To improve categorization accuracy, you can modify the prompt in the file:
prompt = (
"Analyze the following text and categorize it into ONE of these categories: "
"Programming, Medicine, Business, Science, History, Personal Development, Education, Technology. "
"Return ONLY the category name.\n\n"
f"Text: {text[:2000]}\n\n"
"Category:"
)