noahsantacruz commited on
Commit
3e946ec
1 Parent(s): 8a44bf8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -41,6 +41,11 @@ It detects the following types of entities:
41
  | Group | Name of a group of people. E.g. nations (Egypt), schools (Bet Hillel, Tosafot) |
42
  | Citation | Citations to Torah texts. See notes below. |
43
 
 
 
 
 
 
44
  ## Notes on citation matches
45
 
46
  - Final parentheses is not included in the match. E.g. if the citation is `Genesis (1:1)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.
 
41
  | Group | Name of a group of people. E.g. nations (Egypt), schools (Bet Hillel, Tosafot) |
42
  | Citation | Citations to Torah texts. See notes below. |
43
 
44
+ ## Notes on normalization
45
+
46
+ All text the model was trained on was initially put through the following normalizer: [link](https://github.com/Sefaria/Machine-Learning/blob/main/util/helper.py#L43).
47
+ Results will be signicantly worse if this normalizer is not used.
48
+
49
  ## Notes on citation matches
50
 
51
  - Final parentheses is not included in the match. E.g. if the citation is `Genesis (1:1)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.