Results and training data

#1
by msperka - opened

Thanks for publicizing this.!

In the paper (table 2) you reported a nice F1 score of 0.8701, and it was also mentioned that the training was done on NEMO corpus.
was there any changes in this since the paper publication? (Im asking because i was on whatsapp giving credit to NEMO for providing over 90% of the training data - is there more that you are able to share?)

DICTA: The Israel Center for Text Analysis org

The reported scores in the paper were for a model trained and tested solely on the NEMO corpus.
For the training of this model we trained it on a much larger corpus, where NEMO was actually a very small percentage of it, and most of the training data was provided by the IAHLT project.

Shaltiel changed discussion status to closed
Shaltiel changed discussion status to open

may i ask about the F1 results now?

DICTA: The Israel Center for Text Analysis org

Results are similar but harder to estimate, since the IAHLT corpus includes additional tags which aren't included in the NEMO corpus.
We are going to release a detailed document with experiments in the coming weeks. On a much larger test corpus (a subset of the IAHLT corpus) with more domains the overall F1 reaches 0.84.

DICTA: The Israel Center for Text Analysis org

(to contrast, the model trained on NEMO alone does significantly worse on this test corpus.)

Thank You!

msperka changed discussion status to closed

Sign up or log in to comment