Edit model card

EventNet-ITA

The model is a full-text frame parser for events in Italian and it has been trained on EventNet-ITA. The model can be used for full-text Frame Parsing and Event Extraction. Please refer to the paper for a more detailed description.

Model Details

Model Description

In its current version, EventNet-ITA is able to recognize and classifiy 205 semantic frames and their (specific) frame elements. The unit of analysis is the sentence.

Direct Use

Provided with an input sequence of tokens, the model labels each token with the corresponding frame and/or frame element label(s).

La				B-ENTITY*BEING_LOCATED|B-THEME*CONQUERING
cittadina		I-ENTITY*BEING_LOCATED|I-THEME*CONQUERING
,				O
posta			B-BEING_LOCATED
a				B-RELATIVE_LOCATION*BEING_LOCATED
est				I-RELATIVE_LOCATION*BEING_LOCATED
del				I-RELATIVE_LOCATION*BEING_LOCATED
corso			I-RELATIVE_LOCATION*BEING_LOCATED
d'				I-RELATIVE_LOCATION*BEING_LOCATED
acqua			I-RELATIVE_LOCATION*BEING_LOCATED
,				O
venne			O
conquistata		B-CONQUERING
,				O
ma				O
il				B-EXPLOSIVE*DETONATE_EXPLOSIVE
ponte			I-EXPLOSIVE*DETONATE_EXPLOSIVE
sul				I-EXPLOSIVE*DETONATE_EXPLOSIVE
fiume			I-EXPLOSIVE*DETONATE_EXPLOSIVE
era				O
già				O
stato			O
fatto			B-DETONATE_EXPLOSIVE
saltare			I-DETONATE_EXPLOSIVE
regolarmente	    O
dai				B-AGENT*DETONATE_EXPLOSIVE
genieri			I-AGENT*DETONATE_EXPLOSIVE
francesi		I-AGENT*DETONATE_EXPLOSIVE
.				O

Training Details

The model has been trained using MaChAmp, a Python tookit supporting a variety of NLP tasks, by fine-tuning this Italian BERT pretrained model. Training hyperparameters:

  • Batch size: 64
  • Learning rate: 1.5e-3

All other hyperparameters have been left unchanged w.r.t. the default MaChAmp configuration for the multi-sequential token classification task.

Training Data

Please refer to the dataset repo.

Model Re-training

In order to re-train the model, download the dataset and follow the instructions for training a multiseq task in MaChAmp.

Inference

EventNet-ITA's model can be used for Frame Parsing on new texts. In order to do so, you have to follow a few simple steps.

  1. Clone the github repo: git clone https://github.com/machamp-nlp/machamp.git
  2. Download EventNet-ITA's model from this repo (450 MB) and move it into the machamp folder (where is up to you, by default MaChAmp saves trained models in the logs folder)
  3. Save the data you want to use for prediction in a two-column tsv file, one word per line, with a placeholder in column 1, each sentence separated by a blank line (without placeholder), like this:
This	_
is	_
the	_
first	_
sentence	_
.	_

This	_
is	_
the	_
second	_
one	_
.	_
  1. Follow the instruction for predicting with MaChAmp (see section "Prediction") using a fine-tuned model.

Evaluation

The model has been evaluated on three folds, each time with a stratified split of the dataset, with a 80/10/10 train/dev/test ratio. Please see the paper for further details. Hereafter we report the synthetic values obtained by averaging the Precision, Recall and F1-score values of the three splits.

Token-based (relaxed) performance:

P R F1
Frames 0.904 0.914 0.907
Frames (weighted) 0.909 0.919 0.913
Frame Elements 0.841 0.724 0.761
Frames Elements (weighted) 0.850 0.779 0.804

Span-based (strict) performance:

P R F1
Frames 0.906 0.899 0.901
Frames (weighted) 0.909 0.903 0.905
Frame Elements 0.829 0.666 0.724
Frames Elements (weighted) 0.853 0.711 0.768

Citation Information

If you use EventNet-ITA, please cite the following paper:

@inproceedings{rovera-2024-eventnet,
    title = "{E}vent{N}et-{ITA}: {I}talian Frame Parsing for Events",
    author = "Rovera, Marco",
    editor = "Bizzoni, Yuri  and
      Degaetano-Ortlieb, Stefania  and
      Kazantseva, Anna  and
      Szpakowicz, Stan",
    booktitle = "Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)",
    year = "2024",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.latechclfl-1.9",
    pages = "77--90",
}
Downloads last month
0
Unable to determine this model's library. Check the docs .

Dataset used to train mrovera/eventnet-ita