Sefaria
/

he_ref_ner

Token Classification

Model card Files Files and versions Community

noahsantacruz commited on Nov 28, 2023

Commit

6988019

•

1 Parent(s): c4b580f

Update README.md

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -6,8 +6,8 @@ language:
 - he
 widget:
 - text: "משה קבל תורה מסיני (אבות פרק א משנה א)"
-- text: "ירושלמי פאה כג ע״ד"
-- text: "כתוב ברש״י ד״ה אמר שהוא לא חייב"
 model-index:
 - name: he_ref_ner
   results:
@@ -38,6 +38,11 @@ It detects the following types of entities:
 |---|---|
 | מקור | Citations to Torah texts. See notes below. |
 ## Notes on citation matches
 - Final parentheses is not included in the match. E.g. if the citation is `בראשית (א:א)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.

 - he
 widget:
 - text: "משה קבל תורה מסיני (אבות פרק א משנה א)"
+- text: 'ירושלמי פאה כג ע"ד'
+- text: 'ראה רש"י ברכות דף יב ד"ה ואמר שהוא חולק'
 model-index:
 - name: he_ref_ner
   results:
 |---|---|
 | מקור | Citations to Torah texts. See notes below. |
+# Notes on normalization
+All text the model was trained on was initially put through the following normalizer: [link](https://github.com/Sefaria/Machine-Learning/blob/main/util/helper.py#L43).
+Results will be signicantly worse if this normalizer is not used.
 ## Notes on citation matches
 - Final parentheses is not included in the match. E.g. if the citation is `בראשית (א:א)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.