sahajrajmalla
/

patrakar

 ---
 license: mit
+tags:
+- nepali-nlp, nepali-news-classificiation, nlp, transformers
+model-index:
+- name: patrakar
+  results: []
+widget:
+- text: "नेकपा (एमाले)का नेता गोकर्णराज विष्टले सहमति र सहकार्यबाटै संविधान बनाउने तथा जनताको जीवनस्तर उकास्ने काम गर्नु नै अबको मुख्य काम रहेको बताएका छन् ।"
+  example_title: "Example 1"
+- text: "राजनीतिक स्थिरता नहुँदा विकास निर्माणले गति लिन सकेन"
+  example_title: "Example 2"
+- text: "छाउगोठ भत्काइदिए फेरि बनाउने, बनाउन नपाए ओडार वा बारीका कान्लामा रात बिताउने र ज्यानकै जोखिम मोल्न तयार हुने प्रवृत्तिबाट थाहा हुन्छ– छाउपडी प्रथा हटाउनका लागि बनाइएका अहिलेसम्मका योजना, रणनीति उपयुक्त छैनन् र गरिएको लगानी खेर गइरहेको छ"
+  example_title: "Example 3"
 ---
+# patrakar/ पत्रकार (Nepali News Classifier)
+Last updated: September 2022
+DistilBERT model with  on 9 newsgroup datasets for the Nepali language with 95.475% accuracy.
+## Model Details
+patrakar is a DistilBERT pre-trained sequence classification transformer model which classifies Nepali language news into 9 newsgroup category, such as:
+- politics
+- opinion
+- bank
+- entertainment
+- economy
+- health
+- literature
+- sports
+- tourism
+It is developed by Sahaj Raj Malla  to be generally usefuly for general public and so that others could explore them for commercial and scientific purposes. This model was trained on [Sakonii/distilgpt2-nepali](https://huggingface.co/Sakonii/distilgpt2-nepali) model.
+It achieves the following results on the test dataset:
+| Total Number of samples | Accuracy(%)
+|:-------------:|:---------------:
+| 5670        | 95.475
+### Model date
+September 2022
+### Model type
+Sequence classification model
+### Model version
+1.0.0
+## Model Usage
+This model can be used directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:
+```python
+from transformers import pipeline, set_seed
+set_seed(42)
+classifier = pipeline('text-classification', model=model_name)
+text = "नेकपा (एमाले)का नेता गोकर्णराज विष्टले सहमति र सहकार्यबाटै संविधान बनाउने तथा जनताको जीवनस्तर उकास्ने काम गर्नु नै अबको मुख्य काम रहेको बताएका छन् ।"
+classifier(text)
+```
+Here is how we can use the model to get the features of a given text in PyTorch:
+```python
+!pip install transformers pytorch
+from transformers import AutoTokenizer
+from transformers import AutoModelForSequenceClassification
+import torch
+import torch.nn.functional as F
+# initializing model and tokenizer
+model_name = "sahajrajmalla/patrakar"
+# downloading tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# downloading model
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+def tokenize_function(examples):
+    return tokenizer(examples["data"], padding="max_length", truncation=True)
+# predicting with the model
+word_i_want_to_predict = "राजनीतिक स्थिरता नहुँदा विकास निर्माणले गति लिन सकेन"
+# initializing our labels
+label_list = [
+                "bank",
+                "economy",
+                "entertainment",
+                "health",
+                "literature",
+                "opinion",
+                "politics",
+                "sports",
+                "tourism"
+]
+batch = tokenizer(word_i_want_to_predict, padding=True, truncation=True, max_length=512, return_tensors='pt')
+with torch.no_grad():
+    outputs = model(**batch)
+    predictions = F.softmax(outputs.logits, dim=1)
+    labels = torch.argmax(predictions, dim=1)
+print(f"The sequence: \n\n {word_i_want_to_predict} \n\n is predicted to be of newsgroup {label_list[labels.item()]}")
+```
+## Training data
+This model is trained on 50,945 rows of Nepali language news grouped [dataset](https://www.kaggle.com/competitions/text-it-meet-22/data?select=train.csv) found on Kaggle which was also used in IT Meet 2022 Text challenge.
+##
+## Framework versions
+- Transformers 4.20.1
+- Pytorch 1.9.1
+- Datasets 2.0.0
+- Tokenizers 0.11.6