aubmindlab
/

bert-large-arabertv02-twitter

Fill-Mask Transformers PyTorch TensorBoard Safetensors Arabic bert Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

wissamantoun commited on Nov 15, 2022

Commit

ac20501

•

1 Parent(s): b20deb8

Update README.md

Files changed (1) hide show

README.md +8 -7

README.md CHANGED Viewed

@@ -1,11 +1,12 @@
 ---
 language: ar
 datasets:
- - wikipedia
- - OSIAN
- - 1.5B Arabic Corpus
- - OSCAR Arabic Unshuffled
- - Twitter
 widget:
  - text: " عاصمة لبنان هي [MASK] ."
 ---
@@ -20,7 +21,7 @@ AraBERTv0.2-Twitter-base/large are two new models for Arabic dialects and tweets
 The two new models have had emojies added to their vocabulary in addition to common words that weren't at first present. The pre-training was done with a max sentence length of 64 only for 1 epoch.
-**AraBERT** is an Arabic pretrained lanaguage model based on [Google's BERT architechture](https://github.com/google-research/bert). AraBERT uses the same BERT-Base config. More details are available in the [AraBERT Paper](https://arxiv.org/abs/2003.00104) and in the [AraBERT Meetup](https://github.com/WissamAntoun/pydata_khobar_meetup)
 ## Other Models
@@ -71,7 +72,7 @@ Google Scholar has our Bibtex wrong (missing name), use this instead
 }
 ```
 # Acknowledgments
-Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn't have done it without this program, and to the [AUB MIND Lab](https://sites.aub.edu.lb/mindlab/) Members for the continous support. Also thanks to [Yakshof](https://www.yakshof.com/#/) and Assafir for data and storage access. Another thanks for Habib Rahal (https://www.behance.net/rahalhabib), for putting a face to AraBERT.
 # Contacts
 **Wissam Antoun**: [Linkedin](https://www.linkedin.com/in/wissam-antoun-622142b4/) | [Twitter](https://twitter.com/wissam_antoun) | [Github](https://github.com/WissamAntoun) | <wfa07@mail.aub.edu> | <wissam.antoun@gmail.com>

 ---
 language: ar
 datasets:
+- wikipedia
+- Osian
+- 1.5B-Arabic-Corpus
+- oscar-arabic-unshuffled
+- Assafir(private)
+- Twitter(private)
 widget:
  - text: " عاصمة لبنان هي [MASK] ."
 ---
 The two new models have had emojies added to their vocabulary in addition to common words that weren't at first present. The pre-training was done with a max sentence length of 64 only for 1 epoch.
+**AraBERT** is an Arabic pretrained language model based on [Google's BERT architechture](https://github.com/google-research/bert). AraBERT uses the same BERT-Base config. More details are available in the [AraBERT Paper](https://arxiv.org/abs/2003.00104) and in the [AraBERT Meetup](https://github.com/WissamAntoun/pydata_khobar_meetup)
 ## Other Models
 }
 ```
 # Acknowledgments
+Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn't have done it without this program, and to the [AUB MIND Lab](https://sites.aub.edu.lb/mindlab/) Members for the continuous support. Also thanks to [Yakshof](https://www.yakshof.com/#/) and Assafir for data and storage access. Another thanks for Habib Rahal (https://www.behance.net/rahalhabib), for putting a face to AraBERT.
 # Contacts
 **Wissam Antoun**: [Linkedin](https://www.linkedin.com/in/wissam-antoun-622142b4/) | [Twitter](https://twitter.com/wissam_antoun) | [Github](https://github.com/WissamAntoun) | <wfa07@mail.aub.edu> | <wissam.antoun@gmail.com>