GorkaUrbizu commited on
Commit
9e32285
1 Parent(s): 43e3296

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -1,3 +1,51 @@
1
  ---
2
  license: cc-by-4.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
+ language:
4
+ - sw
5
  ---
6
+
7
+
8
+ BERT base (cased) model trained on a subset of 125M tokens of cc100-Swahili for our work [Scaling Laws for BERT in Low-Resource Settings](https://youtu.be/dQw4w9WgXcQ) at ACL2023 Findings.
9
+
10
+ The model has 124M parameters (12L), with a vocab size of 50K.
11
+ It was trained for 500K steps with a sequence length of 512 tokens.
12
+
13
+ A bert-medium and bert-mini (8 and 4L) models are available at our [GitHub](https://github.com/orai-nlp/low-scaling-laws/tree/main/models).
14
+
15
+
16
+ Authors
17
+ -----------
18
+ Gorka Urbizu [1], Iñaki San Vicente [1], Xabier Saralegi [1],
19
+ Rodrigo Agerri [2] and Aitor Soroa [2]
20
+
21
+ Affiliation of the authors:
22
+
23
+ [1] Orai NLP Technologies
24
+
25
+ [2] HiTZ Center - Ixa, University of the Basque Country UPV/EHU
26
+
27
+
28
+
29
+ Licensing
30
+ -------------
31
+
32
+ Copyright (C) by Orai NLP Technologies.
33
+
34
+ The corpora, datasets and models created in this work, are licensed under the Creative Commons Attribution 4.0. International License (CC BY 4.0).
35
+
36
+ To view a copy of this license, visit [http://creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/deed.eu).
37
+
38
+
39
+
40
+
41
+ Acknowledgements
42
+ -------------------
43
+ If you use this model please cite the following paper:
44
+
45
+ - G. Urbizu, I. San Vicente, X. Saralegi, R. Agerri, A. Soroa. Scaling Laws for BERT in Low-Resource Settings. Findings of the Association for Computational Linguistics: ACL 2023. July, 2023. Toronto, Canada
46
+
47
+
48
+
49
+ Contact information
50
+ -----------------------
51
+ Gorka Urbizu, Iñaki San Vicente: {g.urbizu,i.sanvicente}@orai.eus