m3hrdadfi commited on
Commit
baa6dd4
1 Parent(s): a6794b7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fa
3
+ license: apache-2.0
4
+ ---
5
+
6
+ # ParsBERT (v2.0)
7
+ A Transformer-based Model for Persian Language Understanding
8
+
9
+ We reconstructed the vocabulary and fine-tuned the ParsBERT v1.1 on the new Persian corpora in order to provide some functionalities for using ParsBERT in other scopes!
10
+ Please follow the [ParsBERT](https://github.com/hooshvare/parsbert) repo for the latest information about previous and current models.
11
+
12
+
13
+ ## Persian NER [ARMAN, PEYMA, ARMAN+PEYMA]
14
+
15
+ This task aims to extract named entities in the text, such as names and label with appropriate `NER` classes such as locations, organizations, etc. The datasets used for this task contain sentences that are marked with `IOB` format. In this format, tokens that are not part of an entity are tagged as `”O”` the `”B”`tag corresponds to the first word of an object, and the `”I”` tag corresponds to the rest of the terms of the same entity. Both `”B”` and `”I”` tags are followed by a hyphen (or underscore), followed by the entity category. Therefore, the NER task is a multi-class token classification problem that labels the tokens upon being fed a raw text. There are two primary datasets used in Persian NER, `ARMAN`, and `PEYMA`. In ParsBERT, we prepared ner for both datasets as well as a combination of both datasets.
16
+
17
+
18
+ ### ARMAN
19
+
20
+ ARMAN dataset holds 7,682 sentences with 250,015 sentences tagged over six different classes.
21
+
22
+ 1. Organization
23
+ 2. Location
24
+ 3. Facility
25
+ 4. Event
26
+ 5. Product
27
+ 6. Person
28
+
29
+
30
+ | Label | # |
31
+ |:------------:|:-----:|
32
+ | Organization | 30108 |
33
+ | Location | 12924 |
34
+ | Facility | 4458 |
35
+ | Event | 7557 |
36
+ | Product | 4389 |
37
+ | Person | 15645 |
38
+
39
+ **Download**
40
+ You can download the dataset from [here](https://github.com/HaniehP/PersianNER)
41
+
42
+
43
+ ## Results
44
+
45
+ The following table summarizes the F1 score obtained by ParsBERT as compared to other models and architectures.
46
+
47
+ | Dataset | ParsBERT v2 | ParsBERT v1 | mBERT | MorphoBERT | Beheshti-NER | LSTM-CRF | Rule-Based CRF | BiLSTM-CRF |
48
+ |---------|-------------|-------------|-------|------------|--------------|----------|----------------|------------|
49
+ | ARMAN | 99.84* | 98.79 | 95.89 | 89.9 | 84.03 | 86.55 | - | 77.45 |
50
+
51
+
52
+ ## How to use :hugs:
53
+
54
+ | Notebook | Description | |
55
+ |:----------|:-------------|------:|
56
+ | [How to use Pipelines](https://github.com/hooshvare/parsbert-ner/blob/master/persian-ner-pipeline.ipynb) | Simple and efficient way to use State-of-the-Art models on downstream tasks through transformers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hooshvare/parsbert-ner/blob/master/persian-ner-pipeline.ipynb) |
57
+
58
+
59
+ ### BibTeX entry and citation info
60
+
61
+ Please cite in publications as the following:
62
+
63
+ ```bibtex
64
+ @article{ParsBERT,
65
+ title={ParsBERT: Transformer-based Model for Persian Language Understanding},
66
+ author={Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, Mohammad Manthouri},
67
+ journal={ArXiv},
68
+ year={2020},
69
+ volume={abs/2005.12515}
70
+ }
71
+ ```
72
+
73
+ ## Questions?
74
+ Post a Github issue on the [ParsBERT Issues](https://github.com/hooshvare/parsbert/issues) repo.