File size: 2,264 Bytes
e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 3f4e734 e66d572 e4ab342 e66d572 e4ab342 e66d572 e4ab342 e66d572 e4ab342 e66d572 e4ab342 e66d572 e4ab342 e66d572 e4ab342 e66d572 e4ab342 e66d572 e4ab342 e66d572 e4ab342 e66d572 e4ab342 16b5869 e66d572 16b5869 e66d572 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
language: bn
tags:
- collaborative
- bengali
- SequenceClassification
license: apache-2.0
datasets: IndicGlue
metrics:
- Loss
- Accuracy
- Precision
- Recall
---
# sahajBERT News Article Classification
## Model description
[sahajBERT](https://huggingface.co/neuropark/sahajBERT) fine-tuned for news article classification using the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue).
The model is trained for classifying articles into 5 different classes:
| Label id | Label |
|:--------:|:----:|
|0 | kolkata|
|1 | state|
|2 | national|
|3 | sports|
|4 | entertainment|
|5 | international|
## Intended uses & limitations
#### How to use
You can use this model directly with a pipeline for Sequence Classification:
```python
from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast
# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC")
# Initialize model
model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC")
# Initialize pipeline
pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model)
raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)
```
#### Limitations and bias
<!-- Provide examples of latent issues and potential remediations. -->
WIP
## Training data
The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT) at step 19519 and trained on the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue).
## Training procedure
Coming soon!
<!-- ```bibtex
@inproceedings{...,
year={2020}
}
``` -->
## Eval results
Loss: 0.2477145493030548
Accuracy: 0.926293408929837
Macro F1: 0.9079785326650756
Recall: 0.926293408929837
Weighted F1: 0.9266428029354202
Macro Precision: 0.9109938492260489
Micro Precision: 0.926293408929837
Weighted Precision: 0.9288535478995414
Macro Recall: 0.9069095007692186
Micro Recall: 0.926293408929837
Weighted Recall: 0.926293408929837
### BibTeX entry and citation info
Coming soon!
<!-- ```bibtex
@inproceedings{...,
year={2020}
}
``` -->
|