osiria commited on
Commit
b86f563
1 Parent(s): 49b5e5b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -52,10 +52,12 @@ pipeline_tag: question-answering
52
 
53
  This is a <b>DeBERTa</b> <b>[1]</b> model for the <b>Italian</b> language, fine-tuned for <b>Extractive Question Answering</b> on the [SQuAD-IT](https://huggingface.co/datasets/squad_it) dataset <b>[2]</b>, using <b>DeBERTa-ITALIAN</b> ([deberta-base-italian](https://huggingface.co/osiria/deberta-base-italian)) as a pre-trained model.
54
 
55
- <b>Update: Version 2.0</b>
56
 
57
  The 2.0 version further improves the performances by exploiting a 2-phases fine-tuning strategy: the model is first fine-tuned on the English SQuAD v2 (1 epoch, 20% warmup ratio, and max learning rate of 3e-5) then further fine-tuned on the Italian SQuAD (2 epochs, no warmup, initial learning rate of 3e-5)
58
 
 
 
59
  <h3>Training and Performances</h3>
60
 
61
  The model is trained to perform question answering, given a context and a question (under the assumption that the context contains the answer to the question). It has been fine-tuned for Extractive Question Answering, using the SQuAD-IT dataset, for 2 epochs with a linearly decaying learning rate starting from 3e-5, maximum sequence length of 384 and document stride of 128.
@@ -113,7 +115,7 @@ pipeline_qa(context = "Alessandro Manzoni è nato a Milano nel 1785",
113
 
114
  <h3>References</h3>
115
 
116
- [1] https://arxiv.org/abs/2006.03654
117
 
118
  [2] https://link.springer.com/chapter/10.1007/978-3-030-03840-3_29
119
 
 
52
 
53
  This is a <b>DeBERTa</b> <b>[1]</b> model for the <b>Italian</b> language, fine-tuned for <b>Extractive Question Answering</b> on the [SQuAD-IT](https://huggingface.co/datasets/squad_it) dataset <b>[2]</b>, using <b>DeBERTa-ITALIAN</b> ([deberta-base-italian](https://huggingface.co/osiria/deberta-base-italian)) as a pre-trained model.
54
 
55
+ <b>update: version 2.0</b>
56
 
57
  The 2.0 version further improves the performances by exploiting a 2-phases fine-tuning strategy: the model is first fine-tuned on the English SQuAD v2 (1 epoch, 20% warmup ratio, and max learning rate of 3e-5) then further fine-tuned on the Italian SQuAD (2 epochs, no warmup, initial learning rate of 3e-5)
58
 
59
+ In order to maximize the benefits of the procedure, [mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) is now directly used as a pre-trained model. When the double fine-tuning is completed, the embedding layer is then compressed as in [deberta-base-italian](https://huggingface.co/osiria/deberta-base-italian) to obtain a mono-lingual model size
60
+
61
  <h3>Training and Performances</h3>
62
 
63
  The model is trained to perform question answering, given a context and a question (under the assumption that the context contains the answer to the question). It has been fine-tuned for Extractive Question Answering, using the SQuAD-IT dataset, for 2 epochs with a linearly decaying learning rate starting from 3e-5, maximum sequence length of 384 and document stride of 128.
 
115
 
116
  <h3>References</h3>
117
 
118
+ [1] https://arxiv.org/abs/2111.09543
119
 
120
  [2] https://link.springer.com/chapter/10.1007/978-3-030-03840-3_29
121