noahsantacruz commited on
Commit
e65a82d
1 Parent(s): 7bd5a1f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -2
README.md CHANGED
@@ -21,6 +21,71 @@ model-index:
21
  type: f_score
22
  value: 0.8579465541
23
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  | Feature | Description |
25
  | --- | --- |
26
  | **Name** | `en_torah_ner` |
@@ -30,8 +95,8 @@ model-index:
30
  | **Components** | `tok2vec`, `ner` |
31
  | **Vectors** | 218765 keys, 218765 unique vectors (50 dimensions) |
32
  | **Sources** | n/a |
33
- | **License** | n/a |
34
- | **Author** | [n/a]() |
35
 
36
  ### Label Scheme
37
 
 
21
  type: f_score
22
  value: 0.8579465541
23
  ---
24
+
25
+ See below for technical details about the model.
26
+
27
+ # Description
28
+
29
+ This model is a named entity recognition model that was trained to run on text that discusses Torah topics (e.g. dvar torahs, Torah blogs, translations of classic Torah texts etc.).
30
+
31
+ It detects the following types of entities:
32
+
33
+ | Label | Description
34
+ |---|---|
35
+ | Person | Name of a person |
36
+ | Group | Name of a group of people. E.g. nations (Egypt), schools (Bet Hillel, Tosafot) |
37
+ | Citation | Citations to Torah texts. See notes below. |
38
+
39
+ ## Notes on citation matches
40
+
41
+ - Final parentheses is not included in the match. E.g. if the citation is `Genesis (1:1)` then the final parentheses will not be included. We found that the model would get confused if the final parentheses was part of the entity. It is fairly simple to add it back in via a deterministic check.
42
+ - Only the first word of a dibur hamatchil is included in the match. E.g. `Tosafot s.v. Amar Rabbi Akiva` only until the word `Amar` will be tagged. We found the model had trouble determining the end of the dibur hamatchil.
43
+ - See Ref part model for a model that can break down citations into chunks so it is simpler to parse them.
44
+
45
+ ## Using with Sefaria-Project
46
+
47
+ The [Sefaria-Project](https://github.com/Sefaria/Sefaria-Project) repo can use this model to return objects linked to objects in the Sefaria database. Non-citation entities are linked to `Topic` objects and citation entities are linked to `Ref` objects.
48
+
49
+ ### Configuring Sefaria-Project to use this model
50
+
51
+ The assumption is that Sefaria-Project is set up on your environment following the instructions in our [README](https://github.com/Sefaria/Sefaria-Project/blob/master/README.mkd).
52
+
53
+ In `local_settings.py`, modify the following lines:
54
+
55
+ ```python
56
+ RAW_REF_MODEL_BY_LANG_FILEPATH = {
57
+ "en": "/path/to/torah-ner-english model"
58
+ }
59
+ ```
60
+
61
+ ### Running the model with Sefaria-Project
62
+
63
+ The following code shows an example of instantiating the `Linker` object which uses the ML models and running the `Linker` with input.
64
+
65
+ ```python
66
+ import django
67
+ django.setup()
68
+ from sefaria.model.text import library
69
+
70
+ text = "Moses received the Torah from Har Sinai (Avot Chapter 1 Mishnah 1)"
71
+ linker = library.get_linker("en")
72
+ doc = linker.link(text)
73
+
74
+ print("Named entities")
75
+ for resolved_named_entity in doc.resolved_named_entities:
76
+ print("---")
77
+ print("Text:", resolved_named_entity.raw_entity.text)
78
+ print("Topic Slug:", resolved_named_entity.topic.slug)
79
+
80
+ print("Citations")
81
+ for resolved_ref in doc.resolved_refs:
82
+ print("---")
83
+ print("Text:", resolved_ref.raw_entity.text)
84
+ print("Ref:", resolved_ref.ref.normal())
85
+ ```
86
+
87
+ # Technical Details
88
+
89
  | Feature | Description |
90
  | --- | --- |
91
  | **Name** | `en_torah_ner` |
 
95
  | **Components** | `tok2vec`, `ner` |
96
  | **Vectors** | 218765 keys, 218765 unique vectors (50 dimensions) |
97
  | **Sources** | n/a |
98
+ | **License** | GPLv3.0 |
99
+ | **Author** | Sefaria |
100
 
101
  ### Label Scheme
102