gonzalez-agirre commited on
Commit
29fb934
1 Parent(s): 5e25e0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -13
README.md CHANGED
@@ -17,7 +17,7 @@ widget:
17
  pipeline_tag: fill-mask
18
  ---
19
 
20
- # DistilRoBERTa-base-ca
21
 
22
  ## Overview
23
  - **Architecture:** DistilRoBERTa-base
@@ -25,6 +25,31 @@ pipeline_tag: fill-mask
25
  - **Task:** Fill-Mask
26
  - **Data:** Crawling
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## Model description
30
 
@@ -77,14 +102,6 @@ At the time of submission, no measures have been taken to estimate the bias embe
77
 
78
  ## Training
79
 
80
- ### Training procedure
81
-
82
- This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
83
-
84
- It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
85
-
86
- So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
87
-
88
  ### Training data
89
 
90
  The training corpus consists of several corpora gathered from web crawling and public corpora, as shown in the table below:
@@ -106,9 +123,17 @@ The training corpus consists of several corpora gathered from web crawling and p
106
  | Catalan Open Subtitles | 0.02 |
107
  | Tweets | 0.02 |
108
 
 
 
 
 
 
 
 
 
109
  ## Evaluation
110
 
111
- ### Evaluation benchmark
112
 
113
  This model has been fine-tuned on the downstream tasks of the [Catalan Language Understanding Evaluation benchmark (CLUB)](https://club.aina.bsc.es/), which includes the following datasets:
114
 
@@ -128,7 +153,7 @@ This is how it compares to its teacher when fine-tuned on the aforementioned dow
128
 
129
  | Model \ Task |NER (F1)|POS (F1)|STS-ca (Comb.)|TeCla (Acc.)|TEca (Acc.)|CatalanQA (F1/EM)| XQuAD-ca <sup>1</sup> (F1/EM) |
130
  | ------------------------|:-------|:-------|:-------------|:-----------|:----------|:----------------|:------------------------------|
131
- | RoBERTa-base-ca-v2 | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 89.50/76.63 | 73.64/55.42 |
132
  | DistilRoBERTa-base-ca | 87.88 | 98.83 | 77.26 | 73.20 | 76.00 | 84.07/70.77 | 62.93/45.08 |
133
 
134
  <sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
@@ -137,7 +162,7 @@ This is how it compares to its teacher when fine-tuned on the aforementioned dow
137
 
138
  ### Authors
139
 
140
- The Text Mining Unit (TeMU) from Barcelona Supercomputing Center ([bsc-temu@bsc.es](bsc-temu@bsc.es)).
141
 
142
  ### Contact information
143
 
@@ -145,7 +170,7 @@ For further information, send an email to [aina@bsc.es](aina@bsc.es).
145
 
146
  ### Copyright
147
 
148
- Copyright by the Text Mining Unit at Barcelona Supercomputing Center.
149
 
150
  ### Licensing information
151
 
 
17
  pipeline_tag: fill-mask
18
  ---
19
 
20
+ # DistilRoBERTa-base-ca-v2
21
 
22
  ## Overview
23
  - **Architecture:** DistilRoBERTa-base
 
25
  - **Task:** Fill-Mask
26
  - **Data:** Crawling
27
 
28
+ ## Table of Contents
29
+ <details>
30
+ <summary>Click to expand</summary>
31
+
32
+ - [Model description](#model-description)
33
+ - [Intended uses and limitations](#intended-use)
34
+ - [How to use](#how-to-use)
35
+ - [Limitations and bias](#limitations-and-bias)
36
+ - [Training](#training)
37
+ - [Training data](#training-data)
38
+ - [Training procedure](#training-procedure)
39
+ - [Evaluation](#evaluation)
40
+ - [CLUB benchmark](#club-benchmark)
41
+ - [Evaluation results](#evaluation-results)
42
+ - [Licensing Information](#licensing-information)
43
+ - [Additional information](#additional-information)
44
+ - [Author](#author)
45
+ - [Contact information](#contact-information)
46
+ - [Copyright](#copyright)
47
+ - [Licensing information](#licensing-information)
48
+ - [Funding](#funding)
49
+ - [Citing information](#citing-information)
50
+ - [Disclaimer](#disclaimer)
51
+
52
+ </details>
53
 
54
  ## Model description
55
 
 
102
 
103
  ## Training
104
 
 
 
 
 
 
 
 
 
105
  ### Training data
106
 
107
  The training corpus consists of several corpora gathered from web crawling and public corpora, as shown in the table below:
 
123
  | Catalan Open Subtitles | 0.02 |
124
  | Tweets | 0.02 |
125
 
126
+ ### Training procedure
127
+
128
+ This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
129
+
130
+ It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
131
+
132
+ So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
133
+
134
  ## Evaluation
135
 
136
+ ### CLUB benchmark
137
 
138
  This model has been fine-tuned on the downstream tasks of the [Catalan Language Understanding Evaluation benchmark (CLUB)](https://club.aina.bsc.es/), which includes the following datasets:
139
 
 
153
 
154
  | Model \ Task |NER (F1)|POS (F1)|STS-ca (Comb.)|TeCla (Acc.)|TEca (Acc.)|CatalanQA (F1/EM)| XQuAD-ca <sup>1</sup> (F1/EM) |
155
  | ------------------------|:-------|:-------|:-------------|:-----------|:----------|:----------------|:------------------------------|
156
+ | RoBERTa-base-ca-v2 | **89.29** | **98.96** | **79.07** | **74.26** | **83.14** | **89.50**/**76.63** | **73.64**/**55.42** |
157
  | DistilRoBERTa-base-ca | 87.88 | 98.83 | 77.26 | 73.20 | 76.00 | 84.07/70.77 | 62.93/45.08 |
158
 
159
  <sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
 
162
 
163
  ### Authors
164
 
165
+ Language Technologies Unit at Barcelona Supercomputing Center ([langtech@bsc.es](langtech@bsc.es)).
166
 
167
  ### Contact information
168
 
 
170
 
171
  ### Copyright
172
 
173
+ Copyright by the Language Technologies Unit at Barcelona Supercomputing Center.
174
 
175
  ### Licensing information
176