Commit
•
6988019
1
Parent(s):
c4b580f
Update README.md
Browse files
README.md
CHANGED
@@ -6,8 +6,8 @@ language:
|
|
6 |
- he
|
7 |
widget:
|
8 |
- text: "משה קבל תורה מסיני (אבות פרק א משנה א)"
|
9 |
-
- text:
|
10 |
-
- text: "
|
11 |
model-index:
|
12 |
- name: he_ref_ner
|
13 |
results:
|
@@ -38,6 +38,11 @@ It detects the following types of entities:
|
|
38 |
|---|---|
|
39 |
| מקור | Citations to Torah texts. See notes below. |
|
40 |
|
|
|
|
|
|
|
|
|
|
|
41 |
## Notes on citation matches
|
42 |
|
43 |
- Final parentheses is not included in the match. E.g. if the citation is `בראשית (א:א)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.
|
|
|
6 |
- he
|
7 |
widget:
|
8 |
- text: "משה קבל תורה מסיני (אבות פרק א משנה א)"
|
9 |
+
- text: 'ירושלמי פאה כג ע"ד'
|
10 |
+
- text: 'ראה רש"י ברכות דף יב ד"ה ואמר שהוא חולק'
|
11 |
model-index:
|
12 |
- name: he_ref_ner
|
13 |
results:
|
|
|
38 |
|---|---|
|
39 |
| מקור | Citations to Torah texts. See notes below. |
|
40 |
|
41 |
+
# Notes on normalization
|
42 |
+
|
43 |
+
All text the model was trained on was initially put through the following normalizer: [link](https://github.com/Sefaria/Machine-Learning/blob/main/util/helper.py#L43).
|
44 |
+
Results will be signicantly worse if this normalizer is not used.
|
45 |
+
|
46 |
## Notes on citation matches
|
47 |
|
48 |
- Final parentheses is not included in the match. E.g. if the citation is `בראשית (א:א)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.
|