IndicNER /
anoopk's picture
  - as
  - bn
  - gu
  - hi
  - kn
  - ml
  - mr
  - or
  - pa
  - ta
  - te
license: mit
  - Samanantar
  - ner
  - Pytorch
  - transformer
  - multilingual
  - nlp
  - indicnlp


IndicNER is a model trained to complete the task of identifying named entities from sentences in Indian languages. Our model is specifically fine-tuned to the 11 Indian languages mentioned above over millions of sentences. The model is then benchmarked over a human annotated testset and multiple other publicly available Indian NER datasets. The 11 languages covered by IndicNER are: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.

Training Corpus

Our model was trained on a dataset which we mined from the existing Samanantar Corpus. We used a bert-base-multilingual-uncased model as the starting point and then fine-tuned it to the NER dataset mentioned previously.


Download from this same Huggingface repo.

Update 20 Dec 2022: We released a new paper documenting IndicNER and Naamapadam. We have a different model reported in the paper. We will update the repo here soon with this model.


You can use this Colab notebook for samples on using IndicNER or for finetuning a pre-trained model on Naampadam dataset to build your own NER models.


If you are using IndicNER, please cite the following article:

  doi = {10.48550/ARXIV.2212.10168},
  url = {},
  author = {Mhaske, Arnav and Kedia, Harshit and Doddapaneni, Sumanth and Khapra, Mitesh M. and Kumar, Pratyush and Murthy, Rudra and Kunchukuttan, Anoop},
  title = {Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages}
  publisher = {arXiv},
  year = {2022},
  copyright = { perpetual, non-exclusive license}

We would like to hear from you if:

  • You are using our resources. Please let us know how you are putting these resources to use.
  • You have any feedback on these resources.


The IndicNER code (and models) are released under the MIT License.


This work is the outcome of a volunteer effort as part of the AI4Bharat initiative.