anonymous12321 commited on
Commit
c3d285d
·
verified ·
1 Parent(s): 76d91c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -25,7 +25,7 @@ base_model:
25
 
26
  **Council Topics Classifier** is an ensemble machine learning system specialized in **multi-label topic classification** for Portuguese municipal council meeting minutes subjects. The model combines Gradient Boosting with Active Learning and BERTimbau embeddings to identify multiple simultaneous topics within municipal discussion subjects, making it particularly effective for categorizing complex governmental content.
27
 
28
- 🚀 **Try out the model:** [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous12321/GB_CouncilTopics-PT)
29
 
30
  ## Key Features
31
 
@@ -100,18 +100,18 @@ bert_model = AutoModel.from_pretrained("neuralmind/bert-base-portuguese-cased").
100
 
101
  # Preprocess text
102
  text = "A Câmara Municipal aprovou o orçamento de 2024..."
103
- # (apply smart_preprocess function - see app.py)
104
 
105
  # Extract features
106
  tfidf_features = tfidf.transform([text])
107
- # (extract BERT embeddings - see app.py)
108
 
109
  # Combine features and predict
110
  X_combined = np.hstack([tfidf_features.toarray(), bert_embeddings])
111
 
112
  # Get ensemble predictions
113
  logistic_proba = logistic_model.predict_proba(X_combined)
114
- # (apply GB models and adaptive weighting - see app.py)
115
 
116
  # Apply optimal thresholds
117
  predictions = (ensemble_proba >= optimal_thresholds).astype(int)
@@ -125,7 +125,7 @@ print(f"Predicted Topics: {predicted_labels}")
125
 
126
  The model was trained on a curated dataset of Portuguese municipal council meeting minutes:
127
 
128
- - **Documents**: 2,500+ meeting minutes subjects
129
  - **Time Period**: 2021-2024
130
  - **Source**: Portuguese municipalities (anonymized)
131
  - **Labels**: 22 topic categories
 
25
 
26
  **Council Topics Classifier** is an ensemble machine learning system specialized in **multi-label topic classification** for Portuguese municipal council meeting minutes subjects. The model combines Gradient Boosting with Active Learning and BERTimbau embeddings to identify multiple simultaneous topics within municipal discussion subjects, making it particularly effective for categorizing complex governmental content.
27
 
28
+ 🚀 **Try out the model:** [Demo Council Topics Classifier PT](https://huggingface.co/spaces/anonymous12321/Council_Topics_Classifier_PT)
29
 
30
  ## Key Features
31
 
 
100
 
101
  # Preprocess text
102
  text = "A Câmara Municipal aprovou o orçamento de 2024..."
103
+ # (apply smart_preprocess function - see demo source code)
104
 
105
  # Extract features
106
  tfidf_features = tfidf.transform([text])
107
+ # (extract BERT embeddings - see demo source code)
108
 
109
  # Combine features and predict
110
  X_combined = np.hstack([tfidf_features.toarray(), bert_embeddings])
111
 
112
  # Get ensemble predictions
113
  logistic_proba = logistic_model.predict_proba(X_combined)
114
+ # (apply GB models and adaptive weighting - see demo source code)
115
 
116
  # Apply optimal thresholds
117
  predictions = (ensemble_proba >= optimal_thresholds).astype(int)
 
125
 
126
  The model was trained on a curated dataset of Portuguese municipal council meeting minutes:
127
 
128
+ - **Documents**: 2,500+ meeting minutes discussion subjects
129
  - **Time Period**: 2021-2024
130
  - **Source**: Portuguese municipalities (anonymized)
131
  - **Labels**: 22 topic categories