language: en
license: mit
library_name: transformers
base_model: marcmaxmeister/bert-base-uncased-ntee-classifier-v7
tags:
- text-classification
- nonprofit
- ntee
- cause-area
- multilabel-classification
- bert
datasets:
- givingtuesday/ntee-training-data
metrics:
- f1
model-index:
- name: marcmaxmeister/bert-base-uncased-ntee-classifier-v7
results:
- task:
type: text-classification
name: Multi-Label Text Classification
# NTEE Cause Area Classifier
This model classifies nonprofit organization mission statements and activity
descriptions into one or more NTEE (National Taxonomy of Exempt Entities) major
category codes. It was developed by GivingTuesday for the Gates Foundation
nonprofit sector mapping project.
## Model Details
- **Base model:** `marcmaxmeister/bert-base-uncased-ntee-classifier-v7`
- **Version:** `7`
- **Problem type:** Multi-label classification
- **Number of labels:** 28
- **Input:** Nonprofit mission statement + activity description (concatenated)
- **Output:** Primary, secondary, and optional tertiary NTEE major code
## Label Space
The model predicts across 28 NTEE major codes, including two custom splits:
| Code | Category |
|------|----------|
| A | Arts & Culture |
| B | Education (K-12, other) |
| BB | Universities & Colleges *(custom split from B)* |
| C | Environment |
| D | Animal-Related |
| E | Health Clinics & Services |
| EE | Hospitals *(custom split from E)* |
| F | Mental Health |
| G | Voluntary Health Associations |
| H | Medical Research |
| I | Crime & Legal |
| J | Employment |
| K | Food & Agriculture |
| L | Housing & Shelter |
| M | Public Safety |
| N | Recreation & Sports |
| O | Youth Development |
| P | Human Services |
| Q | International Affairs |
| R | Civil Rights & Advocacy |
| S | Community Improvement |
| T | Philanthropy & Foundations |
| U | Science & Technology |
| V | Social Science |
| W | Public Benefit (General) |
| X | Religion |
| Y | Mutual Benefit |
| Z | Unknown/Unclassified |
## Training Data
- **Training rows:** 28584
- **Validation rows:** 15840
- **Source:** Combination of real IRS Form 990/990-EZ filings and
synthetically generated examples produced using Claude (Anthropic)
- **Label encoding:** Multi-hot binary vectors of length 28, derived from
NTEE primary, secondary, and tertiary codes per organization
## Training Configuration
- **Epochs:** 6
- **Learning rate:** 1e-05
- **Batch size (train):** 16
- **Weight decay:** 0.01
- **Mixed precision:** fp16
- **Framework:** Hugging Face Transformers + PyTorch
## Evaluation Results
| Epoch | Eval F1 | EvalLoss | EvalRuntime |
|---|---|---|---|
| 0 | 0.4048 | 1.6717 | 51.5719 |
| 1 | 0.4992 | 1.4850 | 51.4108 |
| 2 | 0.5705 | 1.4298 | 51.4209 |
| 4 | 0.5923 | 1.3915 | 51.4547 |
| 4 | 0.6051 | 1.3683 | 51.4933 |
| 5 | 0.6083 | 1.3679 | 51.5154 |
## Final results
[{'train_runtime': 2181.9363, 'train_samples_per_second': 78.602, 'train_steps_per_second': 1.226, 'total_flos': 2.1819078880526336e+16, 'train_loss': 1.472620153997333, 'epoch': 5.989927252378288, 'step': 2676}]
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from sklearn.preprocessing import MultiLabelBinarizer
import torch
model_id = "givingtuesday/ntee-cause-area-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
def predict_ntee(mission, activities, threshold=0.4, max_labels=3):
text = f"{mission} {activities}"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.sigmoid(logits[0]).cpu().numpy()
ranked = sorted(zip(model.config.id2label.values(), probs), key=lambda x: -x[1])
selected = [(label, float(p)) for label, p in ranked if p >= threshold][:max_labels]
if not selected:
selected = [ranked[0]]
return {
"primary": selected[0] if len(selected) > 0 else None,
"secondary": selected[1] if len(selected) > 1 else None,
"tertiary": selected[2] if len(selected) > 2 else None,
}
```
## Intended Use
- Classifying IRS Form 990 and 990-EZ filers by mission area
- Nonprofit sector research and analysis
- Philanthropic portfolio mapping
## Limitations
- Trained primarily on English-language mission statements
- Performance is lower on categories with fewer training examples
(e.g. Mutual Benefit, Social Science, Unknown/Unclassified)
- The BB/EE custom splits (universities vs. general education; hospitals
vs. general health) are the hardest boundaries for the model to learn
- Not suitable for classifying organizations outside the U.S. nonprofit sector
## Citation
```bibtex
@misc{givingtuesday2025ntee,
author = {GivingTuesday},
title = {NTEE Cause Area Classifier for IRS 990 Data},
year = {2026},
publisher = {GivingTuesday, HuggingFace},
url = {https://huggingface.co/marcmaxmeister/bert-base-uncased-ntee-classifier-v6}
}
```
- Downloads last month
- 54
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support