noahsantacruz commited on
Commit
6988019
1 Parent(s): c4b580f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -6,8 +6,8 @@ language:
6
  - he
7
  widget:
8
  - text: "משה קבל תורה מסיני (אבות פרק א משנה א)"
9
- - text: "ירושלמי פאה כג ע״ד"
10
- - text: "כתוב ברש״י ד״ה אמר שהוא לא חייב"
11
  model-index:
12
  - name: he_ref_ner
13
  results:
@@ -38,6 +38,11 @@ It detects the following types of entities:
38
  |---|---|
39
  | מקור | Citations to Torah texts. See notes below. |
40
 
 
 
 
 
 
41
  ## Notes on citation matches
42
 
43
  - Final parentheses is not included in the match. E.g. if the citation is `בראשית (א:א)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.
 
6
  - he
7
  widget:
8
  - text: "משה קבל תורה מסיני (אבות פרק א משנה א)"
9
+ - text: 'ירושלמי פאה כג ע"ד'
10
+ - text: 'ראה רש"י ברכות דף יב ד"ה ואמר שהוא חולק'
11
  model-index:
12
  - name: he_ref_ner
13
  results:
 
38
  |---|---|
39
  | מקור | Citations to Torah texts. See notes below. |
40
 
41
+ # Notes on normalization
42
+
43
+ All text the model was trained on was initially put through the following normalizer: [link](https://github.com/Sefaria/Machine-Learning/blob/main/util/helper.py#L43).
44
+ Results will be signicantly worse if this normalizer is not used.
45
+
46
  ## Notes on citation matches
47
 
48
  - Final parentheses is not included in the match. E.g. if the citation is `בראשית (א:א)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.