license: mit
language:
- ar
metrics:
- f1
library_name: transformers
pipeline_tag: text-classification
tags:
- code
datasets:
- SinaLab/ArBanking77
ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic
https://www.jarrar.info/publications/JBKEG23.pdf
Online Demo
You can try our model using the demo link below
https://sina.birzeit.edu/arbanking77/
ArBanking77 Corpus
ArBanking77 consists of 31,404 (MSA and Palestinian dialect) that are manually Arabized and localized from the original English Banking77 dataset; which consists of 13,083 queries. Each query is classified into one of the 77 classes (intents) including card arrival, card linking, exchange rate, and automatic top-up. A neural model based on AraBERT was fine-tuned on the ArBanking77 dataset (F1-score 92% for MSA, 90% for PAL)
Corpus Download
A sample data is available in the data
directory. But the entire ArBanking77 corpus is
available to download upon request for academic and commercial use. Request to download
ArBanking77 (corpus and the model).
https://sina.birzeit.edu/arbanking77/
Model Download
huggingface: https://huggingface.co/SinaLab/ArBanking77
Model Training
python run_glue_no_trainer.py
--model_name_or_path aubmindlab/bert-base-arabertv2
--train_file ./data/Banking77_Arabized_Ver3_train_MSA_PAL_merged.json
--validation_file ./data/Banking77_Arabized_Ver3_val_MSA_PAL_merged.json
--seed 42
--max_length 128
--learning_rate 4e-5
--num_train_epochs 20
--per_device_train_batch_size 32
--output_dir ./results
File source: run_glue_no_trainer.py
Credits
This research is partially funded by the Palestinian Higher Council for Innovation and Excellence.
Citation
Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, and Sana Ghanem: ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic. In Proceedings of the 1st Arabic Natural Language Processing Conference (ArabicNLP), Part of the EMNLP 2023. ACL.