metadata
language: bn
tags:
- collaborative
- bengali
- SequenceClassification
license: apache-2.0
datasets: IndicGlue
metrics:
- Loss
- Accuracy
- Precision
- Recall
sahajBERT News Article Classification
Model description
sahajBERT fine-tuned for news article classification using the sna.bn
split of IndicGlue.
The model is trained for classifying articles into 5 different classes:
Label id | Label |
---|---|
0 | kolkata |
1 | state |
2 | national |
3 | sports |
4 | entertainment |
5 | international |
Intended uses & limitations
How to use
You can use this model directly with a pipeline for Sequence Classification:
from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast
# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC")
# Initialize model
model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC")
# Initialize pipeline
pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model)
raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)
Limitations and bias
WIP
Training data
The model was initialized with pre-trained weights of sahajBERT at step 19519 and trained on the sna.bn
split of IndicGlue.
Training procedure
Coming soon!
Eval results
accuracy: 0.9163713678242381
loss: 0.29771897196769714
macro_f1: 0.8951960933373831
macro_precision: 0.8958313840463195
macro_recall: 0.8962088356299692
micro_f1: 0.9163713678242381
micro_precision: 0.9163713678242381
micro_recall: 0.9163713678242381
weighted_f1: 0.916670480049282
weighted_precision: 0.9180146709071523
weighted_recall: 0.9163713678242381
BibTeX entry and citation info
Coming soon!