Results and training data

by msperka - opened Jan 3, 2024

Jan 3, 2024

Thanks for publicizing this.!

In the paper (table 2) you reported a nice F1 score of 0.8701, and it was also mentioned that the training was done on NEMO corpus.
was there any changes in this since the paper publication? (Im asking because i was on whatsapp giving credit to NEMO for providing over 90% of the training data - is there more that you are able to share?)

Shaltiel

DICTA: The Israel Center for Text Analysis org Jan 3, 2024

The reported scores in the paper were for a model trained and tested solely on the NEMO corpus.
For the training of this model we trained it on a much larger corpus, where NEMO was actually a very small percentage of it, and most of the training data was provided by the IAHLT project.

Shaltiel changed discussion status to closed Jan 3, 2024

Shaltiel changed discussion status to open Jan 3, 2024

msperka

Jan 3, 2024

may i ask about the F1 results now?

Shaltiel

DICTA: The Israel Center for Text Analysis org Jan 3, 2024

Results are similar but harder to estimate, since the IAHLT corpus includes additional tags which aren't included in the NEMO corpus.
We are going to release a detailed document with experiments in the coming weeks. On a much larger test corpus (a subset of the IAHLT corpus) with more domains the overall F1 reaches 0.84.

Shaltiel

DICTA: The Israel Center for Text Analysis org Jan 3, 2024

(to contrast, the model trained on NEMO alone does significantly worse on this test corpus.)

msperka

Jan 3, 2024

Thank You!

msperka changed discussion status to closed Jan 3, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment