wissamantoun commited on
Commit
ac20501
1 Parent(s): b20deb8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -1,11 +1,12 @@
1
  ---
2
  language: ar
3
  datasets:
4
- - wikipedia
5
- - OSIAN
6
- - 1.5B Arabic Corpus
7
- - OSCAR Arabic Unshuffled
8
- - Twitter
 
9
  widget:
10
  - text: " عاصمة لبنان هي [MASK] ."
11
  ---
@@ -20,7 +21,7 @@ AraBERTv0.2-Twitter-base/large are two new models for Arabic dialects and tweets
20
 
21
  The two new models have had emojies added to their vocabulary in addition to common words that weren't at first present. The pre-training was done with a max sentence length of 64 only for 1 epoch.
22
 
23
- **AraBERT** is an Arabic pretrained lanaguage model based on [Google's BERT architechture](https://github.com/google-research/bert). AraBERT uses the same BERT-Base config. More details are available in the [AraBERT Paper](https://arxiv.org/abs/2003.00104) and in the [AraBERT Meetup](https://github.com/WissamAntoun/pydata_khobar_meetup)
24
 
25
 
26
  ## Other Models
@@ -71,7 +72,7 @@ Google Scholar has our Bibtex wrong (missing name), use this instead
71
  }
72
  ```
73
  # Acknowledgments
74
- Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn't have done it without this program, and to the [AUB MIND Lab](https://sites.aub.edu.lb/mindlab/) Members for the continous support. Also thanks to [Yakshof](https://www.yakshof.com/#/) and Assafir for data and storage access. Another thanks for Habib Rahal (https://www.behance.net/rahalhabib), for putting a face to AraBERT.
75
 
76
  # Contacts
77
  **Wissam Antoun**: [Linkedin](https://www.linkedin.com/in/wissam-antoun-622142b4/) | [Twitter](https://twitter.com/wissam_antoun) | [Github](https://github.com/WissamAntoun) | <wfa07@mail.aub.edu> | <wissam.antoun@gmail.com>
1
  ---
2
  language: ar
3
  datasets:
4
+ - wikipedia
5
+ - Osian
6
+ - 1.5B-Arabic-Corpus
7
+ - oscar-arabic-unshuffled
8
+ - Assafir(private)
9
+ - Twitter(private)
10
  widget:
11
  - text: " عاصمة لبنان هي [MASK] ."
12
  ---
21
 
22
  The two new models have had emojies added to their vocabulary in addition to common words that weren't at first present. The pre-training was done with a max sentence length of 64 only for 1 epoch.
23
 
24
+ **AraBERT** is an Arabic pretrained language model based on [Google's BERT architechture](https://github.com/google-research/bert). AraBERT uses the same BERT-Base config. More details are available in the [AraBERT Paper](https://arxiv.org/abs/2003.00104) and in the [AraBERT Meetup](https://github.com/WissamAntoun/pydata_khobar_meetup)
25
 
26
 
27
  ## Other Models
72
  }
73
  ```
74
  # Acknowledgments
75
+ Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn't have done it without this program, and to the [AUB MIND Lab](https://sites.aub.edu.lb/mindlab/) Members for the continuous support. Also thanks to [Yakshof](https://www.yakshof.com/#/) and Assafir for data and storage access. Another thanks for Habib Rahal (https://www.behance.net/rahalhabib), for putting a face to AraBERT.
76
 
77
  # Contacts
78
  **Wissam Antoun**: [Linkedin](https://www.linkedin.com/in/wissam-antoun-622142b4/) | [Twitter](https://twitter.com/wissam_antoun) | [Github](https://github.com/WissamAntoun) | <wfa07@mail.aub.edu> | <wissam.antoun@gmail.com>