dumitrescustefan commited on
Commit
2126782
1 Parent(s): 9eb44d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -1
README.md CHANGED
@@ -47,12 +47,40 @@ The baseline is the [Multilingual BERT](https://github.com/google-research/bert/
47
  The model is trained on the following corpora (stats in the table below are after cleaning):
48
 
49
  | Corpus | Lines(M) | Words(M) | Chars(B) | Size(GB) |
50
- |----------- |:--------: |:--------: |:--------: |:--------: |
51
  | OPUS | 55.05 | 635.04 | 4.045 | 3.8 |
52
  | OSCAR | 33.56 | 1725.82 | 11.411 | 11 |
53
  | Wikipedia | 1.54 | 60.47 | 0.411 | 0.4 |
54
  | **Total** | **90.15** | **2421.33** | **15.867** | **15.2** |
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  #### Acknowledgements
57
 
58
  - We'd like to thank [Sampo Pyysalo](https://github.com/spyysalo) from TurkuNLP for helping us out with the compute needed to pretrain the v1.0 BERT models. He's awesome!
 
47
  The model is trained on the following corpora (stats in the table below are after cleaning):
48
 
49
  | Corpus | Lines(M) | Words(M) | Chars(B) | Size(GB) |
50
+ |-----------|:--------:|:--------:|:--------:|:--------:|
51
  | OPUS | 55.05 | 635.04 | 4.045 | 3.8 |
52
  | OSCAR | 33.56 | 1725.82 | 11.411 | 11 |
53
  | Wikipedia | 1.54 | 60.47 | 0.411 | 0.4 |
54
  | **Total** | **90.15** | **2421.33** | **15.867** | **15.2** |
55
 
56
+
57
+ ### Citation
58
+
59
+ If you use this model in a research paper, I'd kindly ask you to cite the following paper:
60
+
61
+ ```
62
+ Stefan Dumitrescu, Andrei-Marius Avram, and Sampo Pyysalo. 2020. The birth of Romanian BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4324–4328, Online. Association for Computational Linguistics.
63
+ ```
64
+
65
+ or, in bibtex:
66
+
67
+ ```
68
+ @inproceedings{dumitrescu-etal-2020-birth,
69
+ title = "The birth of {R}omanian {BERT}",
70
+ author = "Dumitrescu, Stefan and
71
+ Avram, Andrei-Marius and
72
+ Pyysalo, Sampo",
73
+ booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
74
+ month = nov,
75
+ year = "2020",
76
+ address = "Online",
77
+ publisher = "Association for Computational Linguistics",
78
+ url = "https://aclanthology.org/2020.findings-emnlp.387",
79
+ doi = "10.18653/v1/2020.findings-emnlp.387",
80
+ pages = "4324--4328",
81
+ }
82
+ ```
83
+
84
  #### Acknowledgements
85
 
86
  - We'd like to thank [Sampo Pyysalo](https://github.com/spyysalo) from TurkuNLP for helping us out with the compute needed to pretrain the v1.0 BERT models. He's awesome!