dumitrescustefan
/

bert-base-romanian-uncased-v1

Inference Endpoints

Model card Files Files and versions Community

dumitrescustefan commited on Sep 17, 2022

Commit

2126782

•

1 Parent(s): 9eb44d5

Update README.md

Files changed (1) hide show

README.md +29 -1

README.md CHANGED Viewed

@@ -47,12 +47,40 @@ The baseline is the [Multilingual BERT](https://github.com/google-research/bert/
 The model is trained on the following corpora (stats in the table below are after cleaning):
 | Corpus    	| Lines(M) 	| Words(M) 	| Chars(B) 	| Size(GB) 	|
-|-----------	|:--------:	|:--------:	|:--------:	|:--------:	|
 | OPUS      	|   55.05  	|  635.04  	|   4.045  	|    3.8   	|
 | OSCAR     	|   33.56  	|  1725.82 	|  11.411  	|    11    	|
 | Wikipedia 	|   1.54   	|   60.47  	|   0.411  	|    0.4   	|
 | **Total**     	|   **90.15**  	|  **2421.33** 	|  **15.867**  	|   **15.2**   	|
 #### Acknowledgements
 - We'd like to thank [Sampo Pyysalo](https://github.com/spyysalo) from TurkuNLP for helping us out with the compute needed to pretrain the v1.0 BERT models. He's awesome!

 The model is trained on the following corpora (stats in the table below are after cleaning):
 | Corpus    	| Lines(M) 	| Words(M) 	| Chars(B) 	| Size(GB) 	|
+|-----------|:--------:|:--------:|:--------:|:--------:|
 | OPUS      	|   55.05  	|  635.04  	|   4.045  	|    3.8   	|
 | OSCAR     	|   33.56  	|  1725.82 	|  11.411  	|    11    	|
 | Wikipedia 	|   1.54   	|   60.47  	|   0.411  	|    0.4   	|
 | **Total**     	|   **90.15**  	|  **2421.33** 	|  **15.867**  	|   **15.2**   	|
+### Citation
+If you use this model in a research paper, I'd kindly ask you to cite the following paper:
+```
+Stefan Dumitrescu, Andrei-Marius Avram, and Sampo Pyysalo. 2020. The birth of Romanian BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4324–4328, Online. Association for Computational Linguistics.
+```
+or, in bibtex:
+```
+@inproceedings{dumitrescu-etal-2020-birth,
+    title = "The birth of {R}omanian {BERT}",
+    author = "Dumitrescu, Stefan  and
+      Avram, Andrei-Marius  and
+      Pyysalo, Sampo",
+    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
+    month = nov,
+    year = "2020",
+    address = "Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2020.findings-emnlp.387",
+    doi = "10.18653/v1/2020.findings-emnlp.387",
+    pages = "4324--4328",
+}
+```
 #### Acknowledgements
 - We'd like to thank [Sampo Pyysalo](https://github.com/spyysalo) from TurkuNLP for helping us out with the compute needed to pretrain the v1.0 BERT models. He's awesome!