eval results reproducibility

#5
by diana-onutu - opened

How did you deal with the misalignment that appears after tokenization between the tokens and the ner tags? If the word "Japan" has as ner tag "B-LOC", how does it look like after it is tokenized as follows: "JA", "#PA", "#N"? Do you for example re-align the ner tags as "B-LOC", "I-LOC", "I-LOC"? I'm trying to reproduce your evaluation results, but most of them are between 0.5-0.7 (except accuracy). In the calculation of these metrics, do we also evaluate the performance on the "O" label?

Sign up or log in to comment