julien-c HF staff commited on
Commit
c7ee44b
1 Parent(s): e574c07

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/bashar-talafha/multi-dialect-bert-base-arabic/README.md

Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ar
3
+ thumbnail: https://raw.githubusercontent.com/mawdoo3/Multi-dialect-Arabic-BERT/master/multidialct_arabic_bert.png
4
+ datasets:
5
+ - nadi
6
+ ---
7
+ # Multi-dialect-Arabic-BERT
8
+ This is a repository of Multi-dialect Arabic BERT model.
9
+
10
+ By [Mawdoo3-AI](https://ai.mawdoo3.com/).
11
+
12
+ <p align="center">
13
+ <br>
14
+ <img src="https://raw.githubusercontent.com/mawdoo3/Multi-dialect-Arabic-BERT/master/multidialct_arabic_bert.png" alt="Background reference: http://www.qfi.org/wp-content/uploads/2018/02/Qfi_Infographic_Mother-Language_Final.pdf" width="500"/>
15
+ <br>
16
+ <p>
17
+
18
+
19
+
20
+ ### About our Multi-dialect-Arabic-BERT model
21
+ Instead of training the Multi-dialect Arabic BERT model from scratch, we initialized the weights of the model using [Arabic-BERT](https://github.com/alisafaya/Arabic-BERT) and trained it on 10M arabic tweets from the unlabled data of [The Nuanced Arabic Dialect Identification (NADI) shared task](https://sites.google.com/view/nadi-shared-task).
22
+
23
+ ### To cite this work
24
+
25
+ ```
26
+ @misc{talafha2020multidialect,
27
+ title={Multi-Dialect Arabic BERT for Country-Level Dialect Identification},
28
+ author={Bashar Talafha and Mohammad Ali and Muhy Eddin Za'ter and Haitham Seelawi and Ibraheem Tuffaha and Mostafa Samir and Wael Farhan and Hussein T. Al-Natsheh},
29
+ year={2020},
30
+ eprint={2007.05612},
31
+ archivePrefix={arXiv},
32
+ primaryClass={cs.CL}
33
+ }
34
+ ```
35
+
36
+ ### Usage
37
+ The model weights can be loaded using `transformers` library by HuggingFace.
38
+
39
+ ```python
40
+ from transformers import AutoTokenizer, AutoModel
41
+
42
+ tokenizer = AutoTokenizer.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic")
43
+ model = AutoModel.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic")
44
+ ```
45
+
46
+ Example using `pipeline`:
47
+
48
+ ```python
49
+ from transformers import pipeline
50
+
51
+ fill_mask = pipeline(
52
+ "fill-mask",
53
+ model="bashar-talafha/multi-dialect-bert-base-arabic ",
54
+ tokenizer="bashar-talafha/multi-dialect-bert-base-arabic "
55
+ )
56
+
57
+ fill_mask(" سافر الرحالة من مطار [MASK] ")
58
+ ```
59
+ ```
60
+ [{'sequence': '[CLS] سافر الرحالة من مطار الكويت [SEP]', 'score': 0.08296813815832138, 'token': 3226},
61
+ {'sequence': '[CLS] سافر الرحالة من مطار دبي [SEP]', 'score': 0.05123933032155037, 'token': 4747},
62
+ {'sequence': '[CLS] سافر الرحالة من مطار مسقط [SEP]', 'score': 0.046838656067848206, 'token': 13205},
63
+ {'sequence': '[CLS] سافر الرحالة من مطار القاهرة [SEP]', 'score': 0.03234650194644928, 'token': 4003},
64
+ {'sequence': '[CLS] سافر الرحالة من مطار الرياض [SEP]', 'score': 0.02606341242790222, 'token': 2200}]
65
+ ```
66
+ ### Repository
67
+ Please check the [original repository](https://github.com/mawdoo3/Multi-dialect-Arabic-BERT) for more information.
68
+
69
+