morit
/

hindi_xlm_xnli

Zero-Shot Classification

text-classification

Inference Endpoints

Model card Files Files and versions Community

morit commited on Jan 23, 2023

Commit

81c1359

•

1 Parent(s): 8150d63

Create README.md

Files changed (1) hide show

README.md +46 -0

README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+---
+license: mit
+datasets:
+- xnli
+language:
+- hi
+metrics:
+- accuracy
+pipeline_tag: zero-shot-classification
+---
+# XLM-ROBERTA-BASE-XNLI-HI
+## Model description
+This model takes the XLM-Roberta-base model which has been continued to pre-traine on a large corpus of Twitter in multiple languages.
+It was developed following a similar strategy as introduced as part of the [Tweet Eval](https://github.com/cardiffnlp/tweeteval) framework.
+The model is further finetuned on the hindi part of the XNLI training  dataset.
+## Intended Usage
+This model was developed to do Zero-Shot Text Classification in the realm of Hate Speech Detection. It is focused on the language of hindi as it was finetuned on data in said language. Since the base model was pre-trained on 100 different languages it has shown some effectiveness in other languages. Please refer to the list of languages in the [XLM Roberta paper](https://arxiv.org/abs/1911.02116)
+### Usage with Zero-Shot Classification pipeline
+```python
+from transformers import pipeline
+classifier = pipeline("zero-shot-classification",
+                      model="morit/hindi_xlm_xnli")
+```
+## Training
+This model was pre-trained on a set of 100 languages and follwed further training on 198M multilingual tweets as described in the original [paper](https://arxiv.org/abs/2104.12250). Further it was trained on the training set of XNLI dataset in hindi which is a machine translated version of the MNLI dataset. It was trained on 5 epochs of the XNLI train set and evaluated on the XNLI eval dataset at the end of every to find the best performing model. The model which had the highest accuracy on the eval set was chosen at the end.
+![Training Charts from wandb](screen_wandb.png)
+- learning rate:    2e-5
+- batch size:     32
+- max sequence:   length 128
+using a GPU (NVIDIA GeForce RTX 3090) resulting in a training time of 1h 47 mins.
+## Evaluation
+The best performing model was evaluatated on the XNLI test set to get a comparable result
+```
+predict_accuracy = 71.22 %
+```