TymaaHammouda commited on
Commit
a7ec662
β€’
1 Parent(s): 383bc8b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +89 -29
README.md CHANGED
@@ -10,60 +10,120 @@ tags:
10
  datasets:
11
  - SinaLab/ArBanking77
12
  ---
13
- ## ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic
14
 
15
  https://www.jarrar.info/publications/JBKEG23.pdf
16
 
17
- Online Demo
18
- --------
19
- You can try our model using the demo link below
20
-
21
- [https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)
22
 
23
 
24
  ArBanking77 Corpus
25
  --------
26
- ArBanking77 consists of 31,404 (MSA and Palestinian dialect) that are manually Arabized and localized from the original English Banking77 dataset; which consists of 13,083 queries. Each query is classified into one of the 77 classes (intents) including card arrival, card linking, exchange rate, and automatic top-up. A neural model based on AraBERT was fine-tuned on the ArBanking77 dataset (F1-score 92% for MSA, 90% for PAL)
 
 
 
 
27
 
28
 
29
- Corpus Download
30
  --------
31
- A sample data is available in the `data` directory. But the entire ArBanking77 corpus is
32
- available to download upon request for academic and commercial use. Request to download
33
- ArBanking77 (corpus and the model).
 
34
 
35
- [https://sina.birzeit.edu/arbanking77/](https://sina.birzeit.edu/arbanking77/)
36
 
37
  Model Download
38
  --------
39
- huggingface: [https://huggingface.co/SinaLab/ArBanking77](https://huggingface.co/SinaLab/ArBanking77)
40
 
 
 
 
41
 
42
- Model Training
43
  --------
 
 
 
 
 
44
 
45
- ```commandline
46
- python run_glue_no_trainer.py
47
- --model_name_or_path aubmindlab/bert-base-arabertv2
48
- --train_file ./data/Banking77_Arabized_Ver3_train_MSA_PAL_merged.json
49
- --validation_file ./data/Banking77_Arabized_Ver3_val_MSA_PAL_merged.json
50
- --seed 42
51
- --max_length 128
52
- --learning_rate 4e-5
53
- --num_train_epochs 20
54
- --per_device_train_batch_size 32
55
- --output_dir ./results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ```
57
 
58
- File
59
- source: [run_glue_no_trainer.py](https://github.com/huggingface/transformers/blob/e9ad51306fdcc3fb79d837d667e21c6d075a2451/examples/pytorch/text-classification/run_glue_no_trainer.py)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
 
 
 
 
 
 
 
61
 
62
  Credits
63
  -------
64
- This research is partially funded by the Palestinian Higher Council for Innovation and Excellence.
 
 
65
 
66
 
67
  Citation
68
  -------
69
- Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, and Sana Ghanem: [ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic](http://www.jarrar.info/publications/JBKEG23.pdf). In Proceedings of the 1st Arabic Natural Language Processing Conference (ArabicNLP), Part of the EMNLP 2023. ACL.
 
 
 
10
  datasets:
11
  - SinaLab/ArBanking77
12
  ---
 
13
 
14
  https://www.jarrar.info/publications/JBKEG23.pdf
15
 
16
+ ## ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic
17
+ ======================
18
+ ArBanking77 is an MSA and Dialectal Arabic Corpus for Arabic Intent Detection in Banking Domain. It consists of 31,404
19
+ samples (MSA and Palestinian dialects). This repo contains the source-code and sample dataset to train and evaluate
20
+ Arabic Intent Detection model.
21
 
22
 
23
  ArBanking77 Corpus
24
  --------
25
+ ArBanking77 consists of 31,404 (MSA and Palestinian dialects) that are manually Arabized and localized from the original
26
+ English Banking77 dataset; which consists of 13,083 queries. Each query is classified into one of the 77 classes (
27
+ intents) including card arrival, card linking, exchange rate, and automatic top-up. You can find the list of these 77
28
+ intents in the `./data/Banking77_intents.csv` file. A neural model based on AraBERT was fine-tuned on the ArBanking77
29
+ dataset (F1-score 92% for MSA, 90% for PAL)
30
 
31
 
32
+ Full Corpus Download
33
  --------
34
+ A sample data is available in the `data` directory. However, the entire ArBanking77 corpus is
35
+ available to download upon request for academic and commercial use. However, we cannot provide the augmented data.
36
+
37
+ [Request to download ArBanking77 (corpus and the model)](https://sina.birzeit.edu/arbanking77/)
38
 
 
39
 
40
  Model Download
41
  --------
42
+ [SinaLab HuggingFace](https://huggingface.co/SinaLab/ArBanking77)
43
 
44
+ Online Demo
45
+ --------
46
+ You can try our model using this [demo link](https://sina.birzeit.edu/arbanking77/).
47
 
48
+ Requirements
49
  --------
50
+ At this point, the code is compatible with `Python 3.11`
51
+
52
+ Clone this repo
53
+
54
+ git clone https://github.com/SinaLab/ArabicNER.git
55
 
56
+ This package has dependencies on multiple Python packages. It is recommended to Conda to create a new environment
57
+ that mimics the same environment the model was trained in. Provided in this repo `requirements.txt` from which you
58
+ can create a new conda environment using the command below.
59
+
60
+ conda create -n env_name python=3.11
61
+
62
+ Install requirements using pip command:
63
+
64
+ pip install -r requirements.txt
65
+
66
+
67
+ Project Structure
68
+ --------
69
+ ```
70
+ .
71
+ β”œβ”€β”€ data <- data dir
72
+ β”‚ β”œβ”€β”€ Banking77_Arabized_MSA_PAL_train_sample.csv
73
+ β”‚ β”œβ”€β”€ Banking77_Arabized_MSA_PAL_val_sample.csv
74
+ β”‚ β”œβ”€β”€ Banking77_Arabized_MSA_test_sample.csv
75
+ β”‚ β”œβ”€β”€ Banking77_Arabized_PAL_test_sample.csv
76
+ β”‚ β”œβ”€β”€ Banking77_intents.csv
77
+ β”œβ”€β”€ outputs
78
+ β”‚ β”œβ”€β”€ models <- trained models
79
+ β”‚ β”œβ”€β”€ results <- evaluation results and reports
80
+ β”œβ”€β”€ src <- training and evaluation scripts
81
+ β”‚ β”œβ”€β”€ run_glue_no_trainer.py
82
+ β”‚ β”œβ”€β”€ run_glue_no_trainer_eval.py
83
+ β”‚ └── utils.py
84
+ β”œβ”€β”€ .gitignore
85
+ β”œβ”€β”€ LICENSE
86
+ β”œβ”€β”€ README.md
87
+ └── requirements.txt
88
  ```
89
 
90
+ Model Training
91
+ --------
92
+ You can start model training by running the following command. It's recommended to pass the arguments demonstrated below
93
+ to get results similar to the ones reported in the paper.
94
+
95
+ python ./src/run_glue_no_trainer.py
96
+ --model_name_or_path aubmindlab/bert-base-arabertv02
97
+ --train_file ./data/Banking77_Arabized_MSA_PAL_train_sample.csv
98
+ --validation_file ./data/Banking77_Arabized_MSA_PAL_val_sample.csv
99
+ --seed 42
100
+ --max_length 128
101
+ --learning_rate 4e-5
102
+ --num_train_epochs 20
103
+ --per_device_train_batch_size 64
104
+ --output_dir ./outputs/models
105
+
106
+ Evaluation
107
+ --------
108
+ Additionally, you can evaluate the trained model on `Banking77_Arabized_MSA_test_sample.csv` and `Banking77_Arabized_PAL_test_sample.csv` test sets as follows:
109
 
110
+ python ./src/run_glue_no_trainer_eval.py
111
+ --model_name_or_path ./outputs/models
112
+ --validation_file ./data/Banking77_Arabized_MSA_test_sample.csv
113
+ --seed 42
114
+ --per_device_eval_batch_size 64
115
+ --results_dir ./outputs/results
116
+ --log_path ./outputs/logs/log.txt
117
 
118
  Credits
119
  -------
120
+ This research was funded by the Palestinian Higher Council for Innovation and Excellence and the Scientific and
121
+ Technological Research Council of Türkiye (TÜBİTAK) under project No. 120N761 - CONVERSER: Conversational AI System for
122
+ Arabic.
123
 
124
 
125
  Citation
126
  -------
127
+ Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, and Sana
128
+ Ghanem: [ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic](http://www.jarrar.info/publications/JBKEG23.pdf).
129
+ In Proceedings of the 1st Arabic Natural Language Processing Conference (ArabicNLP), Part of the EMNLP 2023. ACL.