File size: 10,405 Bytes
97fdc1f ba1a56a 97fdc1f ba1a56a 97fdc1f ba1a56a 97fdc1f ba1a56a 97fdc1f ba1a56a 97fdc1f ba1a56a 97fdc1f ba1a56a 97fdc1f e24a5a2 97fdc1f ba1a56a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
---
license: apache-2.0
tags:
- generated_from_trainer
- finance
- intent-classification
datasets:
- banking77
model-index:
- name: banking-intent-distilbert-classifier
results: []
language:
- en
metrics:
- accuracy
pipeline_tag: text-classification
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# banking-intent-distilbert-classifier
This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the banking77 dataset.
It achieves the following results on the evaluation set:
- eval_loss: 0.2885
- eval_accuracy: 0.9244
- eval_runtime: 1.9357
- eval_samples_per_second: 1591.148
- eval_steps_per_second: 99.705
- epoch: 10.0
- step: 3130
_Note: This is just a simple example of fine-tuning a DistilBERT model for
multi-class classification task to see how much it costs to train this
model on Google Cloud (using a T4 GPU). It costs me about 1.07 SGD and
takes less than 20 mins to complete the training. Although my intention was just
to test it out on Google Cloud, the model has been appropriately trained
and is now ready to be used. Hopefully, it is what you're looking for._
## Inference example
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("lxyuan/banking-intent-distilbert-classifier")
model = AutoModelForSequenceClassification.from_pretrained("lxyuan/banking-intent-distilbert-classifier")
banking_intend_classifier = TextClassificationPipeline(
model=model,
tokenizer=tokenizer,
device=0
)
banking_intend_classifier("How to report lost card?")
>>> [{'label': 'lost_or_stolen_card', 'score': 0.9518502950668335}]
```
## Training and evaluation data
The BANKING77 dataset consists of online banking queries labeled with their corresponding intents,
offering a comprehensive collection of 77 finely categorized intents within the banking domain.
With a total of 13,083 customer service queries, it specifically emphasizes precise intent detection
within a single domain.
## Training procedure
- To reproduce the result, please refer to this [notebook](https://github.com/LxYuan0420/nlp/blob/main/notebooks/distillbert-intent-classification-banking.ipynb)
- To run the evaluation, please refer to this [evaluation notebook](https://github.com/LxYuan0420/nlp/blob/main/notebooks/Evaluator_from_Huggingface.ipynb)
### Evaluation
<details>
<summary>Evaluation result</summary>
Classification Report:
precision recall f1-score support
activate_my_card 1.0000 0.9750 0.9873 40
age_limit 0.9756 1.0000 0.9877 40
apple_pay_or_google_pay 1.0000 1.0000 1.0000 40
atm_support 0.9750 0.9750 0.9750 40
automatic_top_up 1.0000 0.9000 0.9474 40
balance_not_updated_after_bank_transfer 0.8205 0.8000 0.8101 40
balance_not_updated_after_cheque_or_cash_deposit 1.0000 0.9750 0.9873 40
beneficiary_not_allowed 0.9250 0.9250 0.9250 40
cancel_transfer 1.0000 0.9750 0.9873 40
card_about_to_expire 0.9756 1.0000 0.9877 40
card_acceptance 0.9189 0.8500 0.8831 40
card_arrival 0.9459 0.8750 0.9091 40
card_delivery_estimate 0.8605 0.9250 0.8916 40
card_linking 0.9302 1.0000 0.9639 40
card_not_working 0.8478 0.9750 0.9070 40
card_payment_fee_charged 0.7917 0.9500 0.8636 40
card_payment_not_recognised 0.9231 0.9000 0.9114 40
card_payment_wrong_exchange_rate 0.9048 0.9500 0.9268 40
card_swallowed 1.0000 0.8750 0.9333 40
cash_withdrawal_charge 0.9744 0.9500 0.9620 40
cash_withdrawal_not_recognised 0.8667 0.9750 0.9176 40
change_pin 0.9302 1.0000 0.9639 40
compromised_card 0.8889 0.8000 0.8421 40
contactless_not_working 1.0000 0.9000 0.9474 40
country_support 0.9512 0.9750 0.9630 40
declined_card_payment 0.8125 0.9750 0.8864 40
declined_cash_withdrawal 0.7843 1.0000 0.8791 40
declined_transfer 0.9667 0.7250 0.8286 40
direct_debit_payment_not_recognised 0.9444 0.8500 0.8947 40
disposable_card_limits 0.8974 0.8750 0.8861 40
edit_personal_details 0.9302 1.0000 0.9639 40
exchange_charge 0.9722 0.8750 0.9211 40
exchange_rate 0.9091 1.0000 0.9524 40
exchange_via_app 0.8085 0.9500 0.8736 40
extra_charge_on_statement 1.0000 0.9500 0.9744 40
failed_transfer 0.8333 0.8750 0.8537 40
fiat_currency_support 0.8718 0.8500 0.8608 40
get_disposable_virtual_card 0.9722 0.8750 0.9211 40
get_physical_card 0.9756 1.0000 0.9877 40
getting_spare_card 0.9500 0.9500 0.9500 40
getting_virtual_card 0.8667 0.9750 0.9176 40
lost_or_stolen_card 0.8261 0.9500 0.8837 40
lost_or_stolen_phone 0.9750 0.9750 0.9750 40
order_physical_card 0.9231 0.9000 0.9114 40
passcode_forgotten 1.0000 1.0000 1.0000 40
pending_card_payment 0.9500 0.9500 0.9500 40
pending_cash_withdrawal 1.0000 0.9500 0.9744 40
pending_top_up 0.9268 0.9500 0.9383 40
pending_transfer 0.8611 0.7750 0.8158 40
pin_blocked 0.9714 0.8500 0.9067 40
receiving_money 1.0000 0.9250 0.9610 40
Refund_not_showing_up 1.0000 0.9250 0.9610 40
request_refund 0.9512 0.9750 0.9630 40
reverted_card_payment? 0.9286 0.9750 0.9512 40
supported_cards_and_currencies 0.9744 0.9500 0.9620 40
terminate_account 0.9302 1.0000 0.9639 40
top_up_by_bank_transfer_charge 1.0000 0.8250 0.9041 40
top_up_by_card_charge 0.9286 0.9750 0.9512 40
top_up_by_cash_or_cheque 0.8810 0.9250 0.9024 40
top_up_failed 0.9024 0.9250 0.9136 40
top_up_limits 0.9286 0.9750 0.9512 40
top_up_reverted 0.9706 0.8250 0.8919 40
topping_up_by_card 0.8421 0.8000 0.8205 40
transaction_charged_twice 0.9302 1.0000 0.9639 40
transfer_fee_charged 0.9024 0.9250 0.9136 40
transfer_into_account 0.9167 0.8250 0.8684 40
transfer_not_received_by_recipient 0.7778 0.8750 0.8235 40
transfer_timing 0.8372 0.9000 0.8675 40
unable_to_verify_identity 0.9250 0.9250 0.9250 40
verify_my_identity 0.7955 0.8750 0.8333 40
verify_source_of_funds 0.9524 1.0000 0.9756 40
verify_top_up 1.0000 1.0000 1.0000 40
virtual_card_not_working 1.0000 0.9250 0.9610 40
visa_or_mastercard 0.9737 0.9250 0.9487 40
why_verify_identity 0.9118 0.7750 0.8378 40
wrong_amount_of_cash_received 1.0000 0.8750 0.9333 40
wrong_exchange_rate_for_cash_withdrawal 0.9730 0.9000 0.9351 40
accuracy 0.9244 3080
macro avg 0.9282 0.9244 0.9243 3080
weighted avg 0.9282 0.9244 0.9243 3080
</details>
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
- mixed_precision_training: Native AMP
### Framework versions
- Transformers 4.29.2
- Pytorch 1.9.0+cu111
- Datasets 2.12.0
- Tokenizers 0.13.3 |