Token Classification
Transformers
Safetensors
Bengali
electra
ner
bangla
bengali
Eval Results (legacy)
Instructions to use arafatfahim/BanglaTag with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use arafatfahim/BanglaTag with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="arafatfahim/BanglaTag")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("arafatfahim/BanglaTag") model = AutoModelForTokenClassification.from_pretrained("arafatfahim/BanglaTag") - Notebooks
- Google Colab
- Kaggle
Bangla NER — Named Entity Recognition for Bengali
A fine-tuned token classification model for Bengali (Bangla) Named Entity Recognition using the BIO tagging scheme. Built on top of csebuetnlp/banglabert (ELECTRA-based).
Entity Types
| Tag | Description | Example |
|---|---|---|
PER |
Person names | একেএম শহীদুল হক |
LOC |
Locations, cities, countries | বাংলাদেশ, ঢাকা |
ORG |
Organizations, companies | টুইটার, রিয়াল মাদ্রিদ |
POL |
Political entities / parties | আওয়ামী লীগ |
DATE |
Calendar dates | সোমবার, ২০২৪ সালে |
TIME |
Times of day | সকাল ৮টায় |
EVENT |
Named events | রোহিঙ্গা সঙ্কট |
CRIME |
Crime-related entities | হত্যা মামলা |
TITLE |
Titles, designations | মহাপরিদর্শক |
NUM |
Numbers, quantities | ৯৩ শতাংশ |
SYMBOL |
Symbols, currencies | ৳, % |
CONSTITUENCY |
Electoral constituencies | ঢাকা-১৮ |
INST |
Institutions | তথ্য অধিদপ্তর |
All tags follow BIO format: B- (beginning), I- (inside), O (outside).
Training Details
| Parameter | Value |
|---|---|
| Base model | csebuetnlp/banglabert |
| Architecture | ELECTRA (discriminator) |
| Task | Token Classification (NER) |
| Dataset size | 22,144 sentences |
| Train split | 85% (18,822) |
| Validation split | 7.5% (1,661) |
| Test split | 7.5% (1,661) |
| Max sequence length | 256 tokens |
| Batch size | 16 |
| Epochs | 8 (early stopping, patience=2) |
| Best epoch | 7 |
| Learning rate | 2e-5 |
| LR scheduler | Linear with warmup |
| Warmup steps | 10% of total steps |
| Weight decay | 0.01 |
| Optimizer | AdamW |
| Mixed precision | fp16 |
| Framework | PyTorch + HuggingFace Transformers |
| Hardware | NVIDIA GeForce RTX 4070 Ti SUPER (16 GB) |
Test Set Results (Overall)
| Metric | Score |
|---|---|
| F1 | 74.93% |
| Precision | 75.82% |
| Recall | 74.06% |
| Token Accuracy | 93.41% |
Per-Entity Results (Test Set)
| Entity | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| CONSTITUENCY | 0.8333 | 0.7500 | 0.7895 | 20 |
| CRIME | 0.9489 | 0.9489 | 0.9489 | 137 |
| DATE | 0.7730 | 0.7552 | 0.7640 | 478 |
| EVENT | 0.6827 | 0.6514 | 0.6667 | 109 |
| INST | 0.7119 | 0.7636 | 0.7368 | 55 |
| LOC | 0.7451 | 0.7245 | 0.7347 | 795 |
| NUM | 0.6949 | 0.8913 | 0.7810 | 46 |
| ORG | 0.5617 | 0.5686 | 0.5652 | 408 |
| PER | 0.7654 | 0.7260 | 0.7452 | 719 |
| POL | 0.8182 | 0.8333 | 0.8257 | 54 |
| SYMBOL | 1.0000 | 0.8750 | 0.9333 | 8 |
| TIME | 0.9839 | 0.8472 | 0.9104 | 144 |
| TITLE | 0.9532 | 0.9645 | 0.9588 | 169 |
| micro avg | 0.7582 | 0.7406 | 0.7493 | 3142 |
| macro avg | 0.8056 | 0.7923 | 0.7969 | 3142 |
Usage
With pipeline (recommended)
from transformers import pipeline
ner = pipeline(
"ner",
model="arafatfahim/BanglaTag",
aggregation_strategy="simple",
)
text = "একেএম শহীদুল হক বাংলাদেশে কক্সবাজার এলাকায় সোমবার সংবাদ সম্মেলন করেন"
print(ner(text))
Manual inference
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
model_name = "arafatfahim/BanglaTag"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
model.eval()
tokens = ["একেএম", "শহীদুল", "হক", "বাংলাদেশে", "এসেছেন"]
inputs = tokenizer(tokens, is_split_into_words=True, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predictions = logits.argmax(-1)[0]
word_ids = inputs.word_ids()
prev = None
for word_id, pred_id in zip(word_ids, predictions):
if word_id is None or word_id == prev:
continue
print(f"{tokens[word_id]:20s} → {model.config.id2label[pred_id.item()]}")
prev = word_id
Citation
If you use this model, please cite:
@misc{bangla-ner-2026,
title = {Bangla NER: Fine-tuned BanglaBERT for Bengali Named Entity Recognition},
year = {2026},
url = {https://huggingface.co/arafatfahim/BanglaTag}
}
- Downloads last month
- 14
Model tree for arafatfahim/BanglaTag
Base model
csebuetnlp/banglabertEvaluation results
- f1self-reported0.749
- precisionself-reported0.758
- recallself-reported0.741
- accuracyself-reported0.934