neuropark
/

sahajBERT-NCC

Text Classification

SequenceClassification

Inference Endpoints

Model card Files Files and versions Community

sahajBERT-NCC / README.md

Upload

Step 18149

e66d572 almost 3 years ago

|

raw history blame

No virus

2.27 kB


	---
	language: bn
	tags:
	- collaborative
	- bengali
	- SequenceClassification
	license: apache-2.0
	datasets: IndicGlue
	metrics:
	- Loss
	- Accuracy
	- Precision
	- Recall
	---

	# sahajBERT News Article Classification

	## Model description

	[sahajBERT](https://huggingface.co/neuropark/sahajBERT) fine-tuned for news article classification using the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue).

	The model is trained for classifying articles into 5 different classes:

	\| Label id \| Label \|
	\|:--------:\|:----:\|
	\|0 \| kolkata\|
	\|1 \| state\|
	\|2 \| national\|
	\|3 \| sports\|
	\|4 \| entertainment\|
	\|5 \| international\|

	## Intended uses & limitations

	#### How to use

	You can use this model directly with a pipeline for Sequence Classification:
	```python
	from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast

	# Initialize tokenizer
	tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC")

	# Initialize model
	model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC")

	# Initialize pipeline
	pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model)

	raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
	output = pipeline(raw_text)
	```

	#### Limitations and bias

	<!-- Provide examples of latent issues and potential remediations. -->
	WIP

	## Training data

	The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT) at step 18149 and trained on the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue).

	## Training procedure

	Coming soon!
	<!-- ```bibtex
	@inproceedings{...,
	year={2020}
	}
	``` -->

	## Eval results

	accuracy: 0.920623671155209

	loss: 0.2719293534755707

	macro_f1: 0.8924089161713425

	macro_precision: 0.891858452957785

	macro_recall: 0.8978917764271065

	micro_f1: 0.920623671155209

	micro_precision: 0.920623671155209

	micro_recall: 0.920623671155209

	weighted_f1: 0.9205158122362266

	weighted_precision: 0.9236142214371135

	weighted_recall: 0.920623671155209



	### BibTeX entry and citation info

	Coming soon!
	<!-- ```bibtex
	@inproceedings{...,
	year={2020}
	}
	``` -->