anoopk commited on
Commit
6434e40
1 Parent(s): f8554ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -4
README.md CHANGED
@@ -25,8 +25,7 @@ tags:
25
 
26
  # IndicNER
27
  IndicNER is a model trained to complete the task of identifying named entities from sentences in Indian languages. Our model is specifically fine-tuned to the 11 Indian languages mentioned above over millions of sentences. The model is then benchmarked over a human annotated testset and multiple other publicly available Indian NER datasets.
28
- The 11 languages covered by IndicBERT are: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.
29
- The link to our GitHub repository containing all our code can be found [here](https://github.com/AI4Bharat/indicner). The link to our paper can be found here.
30
 
31
  ## Training Corpus
32
  Our model was trained on a [dataset](https://huggingface.co/datasets/ai4bharat/naamapadam) which we mined from the existing [Samanantar Corpus](https://huggingface.co/datasets/ai4bharat/samanantar). We used a bert-base-multilingual-uncased model as the starting point and then fine-tuned it to the NER dataset mentioned previously.
@@ -44,9 +43,9 @@ The first 5 languages (bn, hi, kn, ml, mr ) have large human annotated testsets
44
  ## Downloads
45
  Download from this same Huggingface repo.
46
 
 
47
 
48
-
49
-
50
 
51
  <!-- citing information -->
52
  ## Citing
 
25
 
26
  # IndicNER
27
  IndicNER is a model trained to complete the task of identifying named entities from sentences in Indian languages. Our model is specifically fine-tuned to the 11 Indian languages mentioned above over millions of sentences. The model is then benchmarked over a human annotated testset and multiple other publicly available Indian NER datasets.
28
+ The 11 languages covered by IndicNER are: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.
 
29
 
30
  ## Training Corpus
31
  Our model was trained on a [dataset](https://huggingface.co/datasets/ai4bharat/naamapadam) which we mined from the existing [Samanantar Corpus](https://huggingface.co/datasets/ai4bharat/samanantar). We used a bert-base-multilingual-uncased model as the starting point and then fine-tuned it to the NER dataset mentioned previously.
 
43
  ## Downloads
44
  Download from this same Huggingface repo.
45
 
46
+ ## Usage
47
 
48
+ You can use [this Colab notebook](https://colab.research.google.com/drive/1sYa-PDdZQ_c9SzUgnhyb3Fl7j96QBCS8?usp=sharing) for samples on using IndicNER or for finetuning a pre-trained model on Naampadam dataset to build your own NER models.
 
49
 
50
  <!-- citing information -->
51
  ## Citing