anonymous12321
/

Council_Topics_Classifier_PT

Text Classification

multi-label-classification

gradient-boosting

active-learning

municipal-documents

meeting-minutes

Model card Files Files and versions

anonymous12321 commited on Oct 20, 2025

Commit

c3d285d

·

verified ·

1 Parent(s): 76d91c9

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ base_model:
 **Council Topics Classifier** is an ensemble machine learning system specialized in **multi-label topic classification** for Portuguese municipal council meeting minutes subjects. The model combines Gradient Boosting with Active Learning and BERTimbau embeddings to identify multiple simultaneous topics within municipal discussion subjects, making it particularly effective for categorizing complex governmental content.
-🚀 **Try out the model:** [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/GB_CouncilTopics-PT)
 ## Key Features
@@ -100,18 +100,18 @@ bert_model = AutoModel.from_pretrained("neuralmind/bert-base-portuguese-cased").
 # Preprocess text
 text = "A Câmara Municipal aprovou o orçamento de 2024..."
-# (apply smart_preprocess function - see app.py)
 # Extract features
 tfidf_features = tfidf.transform([text])
-# (extract BERT embeddings - see app.py)
 # Combine features and predict
 X_combined = np.hstack([tfidf_features.toarray(), bert_embeddings])
 # Get ensemble predictions
 logistic_proba = logistic_model.predict_proba(X_combined)
-# (apply GB models and adaptive weighting - see app.py)
 # Apply optimal thresholds
 predictions = (ensemble_proba >= optimal_thresholds).astype(int)
@@ -125,7 +125,7 @@ print(f"Predicted Topics: {predicted_labels}")
 The model was trained on a curated dataset of Portuguese municipal council meeting minutes:
-- **Documents**: 2,500+ meeting minutes subjects
 - **Time Period**: 2021-2024
 - **Source**: Portuguese municipalities (anonymized)
 - **Labels**: 22 topic categories

 **Council Topics Classifier** is an ensemble machine learning system specialized in **multi-label topic classification** for Portuguese municipal council meeting minutes subjects. The model combines Gradient Boosting with Active Learning and BERTimbau embeddings to identify multiple simultaneous topics within municipal discussion subjects, making it particularly effective for categorizing complex governmental content.
+🚀 **Try out the model:** [Demo Council Topics Classifier PT](https://huggingface.co/spaces/anonymous12321/Council_Topics_Classifier_PT)
 ## Key Features
 # Preprocess text
 text = "A Câmara Municipal aprovou o orçamento de 2024..."
+# (apply smart_preprocess function - see demo source code)
 # Extract features
 tfidf_features = tfidf.transform([text])
+# (extract BERT embeddings - see demo source code)
 # Combine features and predict
 X_combined = np.hstack([tfidf_features.toarray(), bert_embeddings])
 # Get ensemble predictions
 logistic_proba = logistic_model.predict_proba(X_combined)
+# (apply GB models and adaptive weighting - see demo source code)
 # Apply optimal thresholds
 predictions = (ensemble_proba >= optimal_thresholds).astype(int)
 The model was trained on a curated dataset of Portuguese municipal council meeting minutes:
+- **Documents**: 2,500+ meeting minutes discussion subjects
 - **Time Period**: 2021-2024
 - **Source**: Portuguese municipalities (anonymized)
 - **Labels**: 22 topic categories