File size: 5,224 Bytes
7c76633
 
530b7a1
 
12c487d
 
 
 
 
 
 
7c76633
530b7a1
 
12c487d
 
8d3755d
530b7a1
 
 
 
 
 
12c487d
530b7a1
 
12c487d
530b7a1
12c487d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
530b7a1
 
12c487d
530b7a1
12c487d
 
 
 
 
 
530b7a1
 
 
 
 
12c487d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
530b7a1
 
 
12c487d
530b7a1
12c487d
 
 
 
 
 
 
530b7a1
 
12c487d
 
 
 
 
 
 
530b7a1
 
 
12c487d
 
 
 
 
8d3755d
 
 
 
 
 
 
 
 
 
 
 
12c487d
530b7a1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
license: agpl-3.0
language:
- it
task_categories:
- token-classification
datasets:
- mrovera/eventnet-ita
tags:
- Frame Parsing
- Event Extraction
---
# EventNet-ITA

The model is a full-text frame parser for events in Italian and it has been trained on [EventNet-ITA](https://huggingface.co/datasets/mrovera/eventnet-ita).
The model can be used for _full-text_ Frame Parsing and Event Extraction.
Please refer to the [paper](https://aclanthology.org/2024.latechclfl-1.9) for a more detailed description.


## Model Details

### Model Description

In its current version, EventNet-ITA is able to recognize and classifiy 205 semantic frames and their (specific) frame elements. The unit of analysis is the sentence. 


### Direct Use

Provided with an input sequence of tokens, the model labels each token with the corresponding frame and/or frame element label(s). 
```
La				B-ENTITY*BEING_LOCATED|B-THEME*CONQUERING
cittadina		I-ENTITY*BEING_LOCATED|I-THEME*CONQUERING
,				O
posta			B-BEING_LOCATED
a				B-RELATIVE_LOCATION*BEING_LOCATED
est				I-RELATIVE_LOCATION*BEING_LOCATED
del				I-RELATIVE_LOCATION*BEING_LOCATED
corso			I-RELATIVE_LOCATION*BEING_LOCATED
d'				I-RELATIVE_LOCATION*BEING_LOCATED
acqua			I-RELATIVE_LOCATION*BEING_LOCATED
,				O
venne			O
conquistata		B-CONQUERING
,				O
ma				O
il				B-EXPLOSIVE*DETONATE_EXPLOSIVE
ponte			I-EXPLOSIVE*DETONATE_EXPLOSIVE
sul				I-EXPLOSIVE*DETONATE_EXPLOSIVE
fiume			I-EXPLOSIVE*DETONATE_EXPLOSIVE
era				O
già				O
stato			O
fatto			B-DETONATE_EXPLOSIVE
saltare			I-DETONATE_EXPLOSIVE
regolarmente	    O
dai				B-AGENT*DETONATE_EXPLOSIVE
genieri			I-AGENT*DETONATE_EXPLOSIVE
francesi		I-AGENT*DETONATE_EXPLOSIVE
.				O
```


## Training Details

The model has been trained using [MaChAmp](https://github.com/machamp-nlp/machamp), a Python tookit supporting a variety of NLP tasks, by fine-tuning [this Italian BERT pretrained model](https://huggingface.co/dbmdz/bert-base-italian-xxl-cased).
Training hyperparameters:
- Batch size: 64
- Learning rate: 1.5e-3

All other hyperparameters have been left unchanged w.r.t. the default MaChAmp configuration for the multi-sequential token classification task.



### Training Data

Please refer to the [dataset repo](https://huggingface.co/datasets/mrovera/eventnet-ita).


### Model Re-training

In order to re-train the model, download the [dataset](https://huggingface.co/datasets/mrovera/eventnet-ita) and follow the instructions for training a [multiseq task](https://github.com/machamp-nlp/machamp/blob/master/docs/multiseq.md) in MaChAmp.


### Inference

EventNet-ITA's model can be used for Frame Parsing on new texts. 
In order to do so, you have to follow a few simple steps.
1. Clone the github repo: `git clone https://github.com/machamp-nlp/machamp.git`
2. Download EventNet-ITA's model from this repo (450 MB) and move it into the `machamp` folder (where is up to you, by default MaChAmp saves trained models in the logs folder)
3. Save the data you want to use for prediction in a two-column tsv file, one word per line, with a placeholder in column 1, each sentence separated by a blank line (without placeholder), like this:
```
This	_
is	_
the	_
first	_
sentence	_
.	_

This	_
is	_
the	_
second	_
one	_
.	_
```
4. Follow the instruction for predicting with [MaChAmp](https://github.com/machamp-nlp/machamp) (see section "Prediction") using a fine-tuned model.

## Evaluation

The model has been evaluated on three folds, each time with a stratified split of the dataset, with a 80/10/10 train/dev/test ratio. Please see the paper for further details. Hereafter we report the synthetic values obtained by averaging the Precision, Recall and F1-score values of the three splits.

**Token-based** (**_relaxed_**) performance:
|                            |    P   |    R    |   F1    |
|----------------------------|--------|---------|---------|
|Frames                      |  0.904 |  0.914  |  **0.907**  |
|Frames (weighted)           |  0.909 |  0.919  |  0.913  |
|Frame Elements              |  0.841 |  0.724  |  **0.761**  |
|Frames Elements (weighted)  |  0.850 |  0.779  |  0.804  |


**Span-based** (**_strict_**) performance:
|                            |    P   |    R    |   F1   |
|----------------------------|--------|---------|--------|
|Frames                      |  0.906 |  0.899  |  **0.901** |
|Frames (weighted)           |  0.909 |  0.903  |  0.905 |
|Frame Elements              |  0.829 |  0.666  |  **0.724** |
|Frames Elements (weighted)  |  0.853 |  0.711  |  0.768 |



### Citation Information

If you use EventNet-ITA, please cite the following paper:

```
@inproceedings{rovera-2024-eventnet,
    title = "{E}vent{N}et-{ITA}: {I}talian Frame Parsing for Events",
    author = "Rovera, Marco",
    editor = "Bizzoni, Yuri  and
      Degaetano-Ortlieb, Stefania  and
      Kazantseva, Anna  and
      Szpakowicz, Stan",
    booktitle = "Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)",
    year = "2024",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.latechclfl-1.9",
    pages = "77--90",
}
```