sahajBERT-NCC / README.md
Upload
Step 19519
3f4e734
metadata
language: bn
tags:
  - collaborative
  - bengali
  - SequenceClassification
license: apache-2.0
datasets: IndicGlue
metrics:
  - Loss
  - Accuracy
  - Precision
  - Recall

sahajBERT News Article Classification

Model description

sahajBERT fine-tuned for news article classification using the sna.bn split of IndicGlue.

The model is trained for classifying articles into 5 different classes:

Label id Label
0 kolkata
1 state
2 national
3 sports
4 entertainment
5 international

Intended uses & limitations

How to use

You can use this model directly with a pipeline for Sequence Classification:

from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast

# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC")

# Initialize model
model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC")

# Initialize pipeline
pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model)

raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)

Limitations and bias

WIP

Training data

The model was initialized with pre-trained weights of sahajBERT at step 19519 and trained on the sna.bn split of IndicGlue.

Training procedure

Coming soon!

Eval results

accuracy: 0.9163713678242381

loss: 0.29771897196769714

macro_f1: 0.8951960933373831

macro_precision: 0.8958313840463195

macro_recall: 0.8962088356299692

micro_f1: 0.9163713678242381

micro_precision: 0.9163713678242381

micro_recall: 0.9163713678242381

weighted_f1: 0.916670480049282

weighted_precision: 0.9180146709071523

weighted_recall: 0.9163713678242381

BibTeX entry and citation info

Coming soon!