--- language: bn tags: - collaborative - bengali - SequenceClassification license: apache-2.0 datasets: IndicGlue metrics: - Loss - Accuracy - Precision - Recall --- # sahajBERT News Article Classification ## Model description [sahajBERT](https://huggingface.co/neuropark/sahajBERT) fine-tuned for news article classification using the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue). The model is trained for classifying articles into 5 different classes: | Label id | Label | |:--------:|:----:| |0 | kolkata| |1 | state| |2 | national| |3 | sports| |4 | entertainment| |5 | international| ## Intended uses & limitations #### How to use You can use this model directly with a pipeline for Sequence Classification: ```python from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast # Initialize tokenizer tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC") # Initialize model model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC") # Initialize pipeline pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model) raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me output = pipeline(raw_text) ``` #### Limitations and bias WIP ## Training data The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT) at step 18149 and trained on the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue). ## Training procedure Coming soon! ## Eval results accuracy: 0.920623671155209 loss: 0.2719293534755707 macro_f1: 0.8924089161713425 macro_precision: 0.891858452957785 macro_recall: 0.8978917764271065 micro_f1: 0.920623671155209 micro_precision: 0.920623671155209 micro_recall: 0.920623671155209 weighted_f1: 0.9205158122362266 weighted_precision: 0.9236142214371135 weighted_recall: 0.920623671155209 ### BibTeX entry and citation info Coming soon!