lvwerra HF staff commited on
Commit
af9b3fd
1 Parent(s): 6d9b3f1

Update Space (evaluate main: 05209ece)

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -10,6 +10,39 @@ pinned: false
10
  tags:
11
  - evaluate
12
  - metric
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  ## Metric description
 
10
  tags:
11
  - evaluate
12
  - metric
13
+ description: >-
14
+ CoVal is a coreference evaluation tool for the CoNLL and ARRAU datasets which
15
+ implements of the common evaluation metrics including MUC [Vilain et al, 1995],
16
+ B-cubed [Bagga and Baldwin, 1998], CEAFe [Luo et al., 2005],
17
+ LEA [Moosavi and Strube, 2016] and the averaged CoNLL score
18
+ (the average of the F1 values of MUC, B-cubed and CEAFe)
19
+ [Denis and Baldridge, 2009a; Pradhan et al., 2011].
20
+
21
+ This wrapper of CoVal currently only work with CoNLL line format:
22
+ The CoNLL format has one word per line with all the annotation for this word in column separated by spaces:
23
+ Column Type Description
24
+ 1 Document ID This is a variation on the document filename
25
+ 2 Part number Some files are divided into multiple parts numbered as 000, 001, 002, ... etc.
26
+ 3 Word number
27
+ 4 Word itself This is the token as segmented/tokenized in the Treebank. Initially the *_skel file contain the placeholder [WORD] which gets replaced by the actual token from the Treebank which is part of the OntoNotes release.
28
+ 5 Part-of-Speech
29
+ 6 Parse bit This is the bracketed structure broken before the first open parenthesis in the parse, and the word/part-of-speech leaf replaced with a *. The full parse can be created by substituting the asterix with the "([pos] [word])" string (or leaf) and concatenating the items in the rows of that column.
30
+ 7 Predicate lemma The predicate lemma is mentioned for the rows for which we have semantic role information. All other rows are marked with a "-"
31
+ 8 Predicate Frameset ID This is the PropBank frameset ID of the predicate in Column 7.
32
+ 9 Word sense This is the word sense of the word in Column 3.
33
+ 10 Speaker/Author This is the speaker or author name where available. Mostly in Broadcast Conversation and Web Log data.
34
+ 11 Named Entities These columns identifies the spans representing various named entities.
35
+ 12:N Predicate Arguments There is one column each of predicate argument structure information for the predicate mentioned in Column 7.
36
+ N Coreference Coreference chain information encoded in a parenthesis structure.
37
+ More informations on the format can be found here (section "*_conll File Format"): http://www.conll.cemantix.org/2012/data.html
38
+
39
+ Details on the evaluation on CoNLL can be found here: https://github.com/ns-moosavi/coval/blob/master/conll/README.md
40
+
41
+ CoVal code was written by @ns-moosavi.
42
+ Some parts are borrowed from https://github.com/clarkkev/deep-coref/blob/master/evaluation.py
43
+ The test suite is taken from https://github.com/conll/reference-coreference-scorers/
44
+ Mention evaluation and the test suite are added by @andreasvc.
45
+ Parsing CoNLL files is developed by Leo Born.
46
  ---
47
 
48
  ## Metric description