TymaaHammouda
commited on
Commit
•
e1af151
1
Parent(s):
ee432a1
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,56 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic
|
2 |
+
|
3 |
+
|
4 |
+
Online Demo
|
5 |
+
--------
|
6 |
+
You can try our model using the demo link below
|
7 |
+
|
8 |
+
[https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)
|
9 |
+
|
10 |
+
|
11 |
+
ArBanking77 Corpus
|
12 |
+
--------
|
13 |
+
ArBanking77 consists of 31,404 (MSA and Palestinian dialect) that are manually Arabized and localized from the original English Banking77 dataset; which consists of 13,083 queries. Each query is classified into one of the 77 classes (intents) including card arrival, card linking, exchange rate, and automatic top-up. A neural model based on AraBERT was fine-tuned on the ArBanking77 dataset (F1-score 92% for MSA, 90% for PAL)
|
14 |
+
|
15 |
+
|
16 |
+
Corpus Download
|
17 |
+
--------
|
18 |
+
A sample data is available in the `data` directory. But the entire ArBanking77 corpus is
|
19 |
+
available to download upon request for academic and commercial use. Request to download
|
20 |
+
ArBanking77 (corpus and the model).
|
21 |
+
|
22 |
+
[https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)
|
23 |
+
|
24 |
+
Model Download
|
25 |
+
--------
|
26 |
+
huggingface: [https://huggingface.co/SinaLab/ArBanking77](https://huggingface.co/SinaLab/ArBanking77)
|
27 |
+
|
28 |
+
|
29 |
+
Model Training
|
30 |
+
--------
|
31 |
+
|
32 |
+
```commandline
|
33 |
+
python run_glue_no_trainer.py
|
34 |
+
--model_name_or_path aubmindlab/bert-base-arabertv2
|
35 |
+
--train_file ./data/Banking77_Arabized_Ver3_train_MSA_PAL_merged.json
|
36 |
+
--validation_file ./data/Banking77_Arabized_Ver3_val_MSA_PAL_merged.json
|
37 |
+
--seed 42
|
38 |
+
--max_length 128
|
39 |
+
--learning_rate 4e-5
|
40 |
+
--num_train_epochs 20
|
41 |
+
--per_device_train_batch_size 32
|
42 |
+
--output_dir ./results
|
43 |
+
```
|
44 |
+
|
45 |
+
File
|
46 |
+
source: [run_glue_no_trainer.py](https://github.com/huggingface/transformers/blob/e9ad51306fdcc3fb79d837d667e21c6d075a2451/examples/pytorch/text-classification/run_glue_no_trainer.py)
|
47 |
+
|
48 |
+
|
49 |
+
Credits
|
50 |
+
-------
|
51 |
+
This research is partially funded by the Palestinian Higher Council for Innovation and Excellence.
|
52 |
+
|
53 |
+
|
54 |
+
Citation
|
55 |
+
-------
|
56 |
+
Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, and Sana Ghanem: [ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic](http://www.jarrar.info/publications/JBKEG23.pdf). In Proceedings of the 1st Arabic Natural Language Processing Conference (ArabicNLP), Part of the EMNLP 2023. ACL.
|