BERTino: an Italian DistilBERT model

This repository hosts BERTino, an Italian DistilBERT model pre-trained by indigo.ai on a large general-domain Italian corpus. BERTino is task-agnostic and can be fine-tuned for every downstream task.

Corpus

The pre-training corpus that we used is the union of the Paisa and ItWaC corpora. The final corpus counts 14 millions of sentences for a total of 12 GB of text.

Downstream Results

To validate the pre-training that we conducted, we evaluated BERTino on the Italian ParTUT, Italian ISDT, Italian WikiNER and multi-class sentence classification tasks. We report for comparison results obtained by the teacher model fine-tuned in the same tasks and for the same number of epochs.

Italian ISDT:

Model F1 score Fine-tuning time Evaluation time
BERTino 0,9801 9m, 4s 3s
Teacher 0,983 16m, 28s 5s

Italian ParTUT:

Model F1 score Fine-tuning time Evaluation time
BERTino 0,9268 1m, 18s 1s
Teacher 0,9688 2m, 18s 1s

Italian WikiNER:

Model F1 score Fine-tuning time Evaluation time
BERTino 0,9038 35m, 35s 3m, 1s
Teacher 0,9178 67m, 8s 5m, 16s

Multi-class sentence classification:

Model F1 score Fine-tuning time Evaluation time
BERTino 0,7788 4m, 40s 6s
Teacher 0,7986 8m, 52s 9s
Downloads last month
158
Hosted inference API
Fill Mask

Mask token: [MASK]