gonzalez-agirre
commited on
Commit
•
29fb934
1
Parent(s):
5e25e0d
Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ widget:
|
|
17 |
pipeline_tag: fill-mask
|
18 |
---
|
19 |
|
20 |
-
# DistilRoBERTa-base-ca
|
21 |
|
22 |
## Overview
|
23 |
- **Architecture:** DistilRoBERTa-base
|
@@ -25,6 +25,31 @@ pipeline_tag: fill-mask
|
|
25 |
- **Task:** Fill-Mask
|
26 |
- **Data:** Crawling
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
## Model description
|
30 |
|
@@ -77,14 +102,6 @@ At the time of submission, no measures have been taken to estimate the bias embe
|
|
77 |
|
78 |
## Training
|
79 |
|
80 |
-
### Training procedure
|
81 |
-
|
82 |
-
This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
|
83 |
-
|
84 |
-
It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
|
85 |
-
|
86 |
-
So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
|
87 |
-
|
88 |
### Training data
|
89 |
|
90 |
The training corpus consists of several corpora gathered from web crawling and public corpora, as shown in the table below:
|
@@ -106,9 +123,17 @@ The training corpus consists of several corpora gathered from web crawling and p
|
|
106 |
| Catalan Open Subtitles | 0.02 |
|
107 |
| Tweets | 0.02 |
|
108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
109 |
## Evaluation
|
110 |
|
111 |
-
###
|
112 |
|
113 |
This model has been fine-tuned on the downstream tasks of the [Catalan Language Understanding Evaluation benchmark (CLUB)](https://club.aina.bsc.es/), which includes the following datasets:
|
114 |
|
@@ -128,7 +153,7 @@ This is how it compares to its teacher when fine-tuned on the aforementioned dow
|
|
128 |
|
129 |
| Model \ Task |NER (F1)|POS (F1)|STS-ca (Comb.)|TeCla (Acc.)|TEca (Acc.)|CatalanQA (F1/EM)| XQuAD-ca <sup>1</sup> (F1/EM) |
|
130 |
| ------------------------|:-------|:-------|:-------------|:-----------|:----------|:----------------|:------------------------------|
|
131 |
-
| RoBERTa-base-ca-v2 | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 89.50
|
132 |
| DistilRoBERTa-base-ca | 87.88 | 98.83 | 77.26 | 73.20 | 76.00 | 84.07/70.77 | 62.93/45.08 |
|
133 |
|
134 |
<sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
|
@@ -137,7 +162,7 @@ This is how it compares to its teacher when fine-tuned on the aforementioned dow
|
|
137 |
|
138 |
### Authors
|
139 |
|
140 |
-
|
141 |
|
142 |
### Contact information
|
143 |
|
@@ -145,7 +170,7 @@ For further information, send an email to [aina@bsc.es](aina@bsc.es).
|
|
145 |
|
146 |
### Copyright
|
147 |
|
148 |
-
Copyright by the
|
149 |
|
150 |
### Licensing information
|
151 |
|
|
|
17 |
pipeline_tag: fill-mask
|
18 |
---
|
19 |
|
20 |
+
# DistilRoBERTa-base-ca-v2
|
21 |
|
22 |
## Overview
|
23 |
- **Architecture:** DistilRoBERTa-base
|
|
|
25 |
- **Task:** Fill-Mask
|
26 |
- **Data:** Crawling
|
27 |
|
28 |
+
## Table of Contents
|
29 |
+
<details>
|
30 |
+
<summary>Click to expand</summary>
|
31 |
+
|
32 |
+
- [Model description](#model-description)
|
33 |
+
- [Intended uses and limitations](#intended-use)
|
34 |
+
- [How to use](#how-to-use)
|
35 |
+
- [Limitations and bias](#limitations-and-bias)
|
36 |
+
- [Training](#training)
|
37 |
+
- [Training data](#training-data)
|
38 |
+
- [Training procedure](#training-procedure)
|
39 |
+
- [Evaluation](#evaluation)
|
40 |
+
- [CLUB benchmark](#club-benchmark)
|
41 |
+
- [Evaluation results](#evaluation-results)
|
42 |
+
- [Licensing Information](#licensing-information)
|
43 |
+
- [Additional information](#additional-information)
|
44 |
+
- [Author](#author)
|
45 |
+
- [Contact information](#contact-information)
|
46 |
+
- [Copyright](#copyright)
|
47 |
+
- [Licensing information](#licensing-information)
|
48 |
+
- [Funding](#funding)
|
49 |
+
- [Citing information](#citing-information)
|
50 |
+
- [Disclaimer](#disclaimer)
|
51 |
+
|
52 |
+
</details>
|
53 |
|
54 |
## Model description
|
55 |
|
|
|
102 |
|
103 |
## Training
|
104 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
### Training data
|
106 |
|
107 |
The training corpus consists of several corpora gathered from web crawling and public corpora, as shown in the table below:
|
|
|
123 |
| Catalan Open Subtitles | 0.02 |
|
124 |
| Tweets | 0.02 |
|
125 |
|
126 |
+
### Training procedure
|
127 |
+
|
128 |
+
This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
|
129 |
+
|
130 |
+
It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
|
131 |
+
|
132 |
+
So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
|
133 |
+
|
134 |
## Evaluation
|
135 |
|
136 |
+
### CLUB benchmark
|
137 |
|
138 |
This model has been fine-tuned on the downstream tasks of the [Catalan Language Understanding Evaluation benchmark (CLUB)](https://club.aina.bsc.es/), which includes the following datasets:
|
139 |
|
|
|
153 |
|
154 |
| Model \ Task |NER (F1)|POS (F1)|STS-ca (Comb.)|TeCla (Acc.)|TEca (Acc.)|CatalanQA (F1/EM)| XQuAD-ca <sup>1</sup> (F1/EM) |
|
155 |
| ------------------------|:-------|:-------|:-------------|:-----------|:----------|:----------------|:------------------------------|
|
156 |
+
| RoBERTa-base-ca-v2 | **89.29** | **98.96** | **79.07** | **74.26** | **83.14** | **89.50**/**76.63** | **73.64**/**55.42** |
|
157 |
| DistilRoBERTa-base-ca | 87.88 | 98.83 | 77.26 | 73.20 | 76.00 | 84.07/70.77 | 62.93/45.08 |
|
158 |
|
159 |
<sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
|
|
|
162 |
|
163 |
### Authors
|
164 |
|
165 |
+
Language Technologies Unit at Barcelona Supercomputing Center ([langtech@bsc.es](langtech@bsc.es)).
|
166 |
|
167 |
### Contact information
|
168 |
|
|
|
170 |
|
171 |
### Copyright
|
172 |
|
173 |
+
Copyright by the Language Technologies Unit at Barcelona Supercomputing Center.
|
174 |
|
175 |
### Licensing information
|
176 |
|