Spaces:
Running
Running
Added description
Browse files
semf1.py
CHANGED
@@ -11,7 +11,7 @@
|
|
11 |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
# See the License for the specific language governing permissions and
|
13 |
# limitations under the License.
|
14 |
-
# TODO: Add test cases,
|
15 |
"""SEM-F1 metric"""
|
16 |
|
17 |
import abc
|
@@ -58,19 +58,51 @@ _KWARGS_DESCRIPTION = """
|
|
58 |
SEM-F1 compares the system generated overlap summary with ground truth reference overlap.
|
59 |
|
60 |
Args:
|
61 |
-
predictions:
|
62 |
-
references:
|
63 |
reference should be a string with tokens separated by spaces.
|
64 |
model_type: str - Model to use. [pv1, stsb, use]
|
65 |
Options:
|
66 |
-
pv1 - paraphrase-distilroberta-base-v1
|
67 |
stsb - stsb-roberta-large
|
68 |
use - Universal Sentence Encoder
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
Returns:
|
70 |
precision: Precision.
|
71 |
recall: Recall.
|
72 |
f1: F1 score.
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
Examples:
|
75 |
|
76 |
>>> import evaluate
|
|
|
11 |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
# See the License for the specific language governing permissions and
|
13 |
# limitations under the License.
|
14 |
+
# TODO: Add test cases, Remove tokenize_sentences flag since it can be determined from the input itself.
|
15 |
"""SEM-F1 metric"""
|
16 |
|
17 |
import abc
|
|
|
58 |
SEM-F1 compares the system generated overlap summary with ground truth reference overlap.
|
59 |
|
60 |
Args:
|
61 |
+
predictions: list - List of predictions (Details below)
|
62 |
+
references: list - List of references (Details below)
|
63 |
reference should be a string with tokens separated by spaces.
|
64 |
model_type: str - Model to use. [pv1, stsb, use]
|
65 |
Options:
|
66 |
+
pv1 - paraphrase-distilroberta-base-v1 (Default)
|
67 |
stsb - stsb-roberta-large
|
68 |
use - Universal Sentence Encoder
|
69 |
+
tokenize_sentences: bool - Sentence tokenize the input document (prediction/reference). Default: True.
|
70 |
+
gpu: Union[bool, int] - Whether to use GPU or CPU.
|
71 |
+
Options:
|
72 |
+
False - CPU (Default)
|
73 |
+
True - GPU, device 0
|
74 |
+
n: int - GPU, device n
|
75 |
+
batch_size: int - Batch Size, Default = 32.
|
76 |
Returns:
|
77 |
precision: Precision.
|
78 |
recall: Recall.
|
79 |
f1: F1 score.
|
80 |
|
81 |
+
There are 4 possible cases for inputs corresponding to predictions and references arguments
|
82 |
+
Case 1: Multi-Ref = False, tokenize_sentences = False
|
83 |
+
predictions: List[List[str]] - List of predictions where each prediction is a list of sentences.
|
84 |
+
references: List[List[str]] - List of references where each reference is a list of sentences.
|
85 |
+
Case 2: Multi-Ref = False, tokenize_sentences = True
|
86 |
+
predictions: List[str] - List of predictions where each prediction is a document
|
87 |
+
references: List[str] - List of references where each reference is a document
|
88 |
+
Case 3: Multi-Ref = True, tokenize_sentences = False
|
89 |
+
predictions: List[List[str]] - List of predictions where each prediction is a list of sentences.
|
90 |
+
references: List[List[List[str]]] - List of multi-references i.e. [[r11, r12, ...], [r21, r22, ...], ...]
|
91 |
+
where each rij is further a list of sentences
|
92 |
+
Case 4: Multi-Ref = True, tokenize_sentences = True
|
93 |
+
predictions: List[str] - List of predictions where each prediction is a document
|
94 |
+
references: List[List[str]] - List of multi-references i.e. [[r11, r12, ...], [r21, r22, ...], ...]
|
95 |
+
where each rij is a document
|
96 |
+
|
97 |
+
This can be seen in the form of truth table as follows:
|
98 |
+
Case | Multi-Ref | tokenize_sentences | predictions | references
|
99 |
+
1 | 0 | 0 | List[List[str]] | List[List[str]]
|
100 |
+
2 | 0 | 1 | List[str] | List[str]
|
101 |
+
3 | 1 | 0 | List[List[str]] | List[List[List[str]]]
|
102 |
+
4 | 1 | 1 | List[str] | List[List[str]]
|
103 |
+
|
104 |
+
It is automatically determined whether it is Multi-Ref case Single-Ref case.
|
105 |
+
|
106 |
Examples:
|
107 |
|
108 |
>>> import evaluate
|