|
|
|
--- |
|
language: bn |
|
tags: |
|
- collaborative |
|
- bengali |
|
- SequenceClassification |
|
license: apache-2.0 |
|
datasets: IndicGlue |
|
metrics: |
|
- Loss |
|
- Accuracy |
|
- Precision |
|
- Recall |
|
--- |
|
|
|
# sahajBERT News Article Classification |
|
|
|
## Model description |
|
|
|
[sahajBERT](https://huggingface.co/neuropark/sahajBERT) fine-tuned for news article classification using the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue). |
|
|
|
The model is trained for classifying articles into 5 different classes: |
|
|
|
| Label id | Label | |
|
|:--------:|:----:| |
|
|0 | kolkata| |
|
|1 | state| |
|
|2 | national| |
|
|3 | sports| |
|
|4 | entertainment| |
|
|5 | international| |
|
|
|
## Intended uses & limitations |
|
|
|
#### How to use |
|
|
|
You can use this model directly with a pipeline for Sequence Classification: |
|
```python |
|
from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast |
|
|
|
# Initialize tokenizer |
|
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC") |
|
|
|
# Initialize model |
|
model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC") |
|
|
|
# Initialize pipeline |
|
pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model) |
|
|
|
raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me |
|
output = pipeline(raw_text) |
|
``` |
|
|
|
#### Limitations and bias |
|
|
|
<!-- Provide examples of latent issues and potential remediations. --> |
|
WIP |
|
|
|
## Training data |
|
|
|
The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT) at step 18149 and trained on the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue). |
|
|
|
## Training procedure |
|
|
|
Coming soon! |
|
<!-- ```bibtex |
|
@inproceedings{..., |
|
year={2020} |
|
} |
|
``` --> |
|
|
|
## Eval results |
|
|
|
accuracy: 0.920623671155209 |
|
|
|
loss: 0.2719293534755707 |
|
|
|
macro_f1: 0.8924089161713425 |
|
|
|
macro_precision: 0.891858452957785 |
|
|
|
macro_recall: 0.8978917764271065 |
|
|
|
micro_f1: 0.920623671155209 |
|
|
|
micro_precision: 0.920623671155209 |
|
|
|
micro_recall: 0.920623671155209 |
|
|
|
weighted_f1: 0.9205158122362266 |
|
|
|
weighted_precision: 0.9236142214371135 |
|
|
|
weighted_recall: 0.920623671155209 |
|
|
|
|
|
|
|
### BibTeX entry and citation info |
|
|
|
Coming soon! |
|
<!-- ```bibtex |
|
@inproceedings{..., |
|
year={2020} |
|
} |
|
``` --> |
|
|