Edit model card

distilbert-base-uncased-finetuned-FiNER

This model is a fine-tuned version of distilbert/distilbert-base-uncased trained on a subset of the nlpaueb/finer-139 dataset. The subset is generated by filtering the dataset to contain only samples with at least one of the following NER tags:

  • 'O',
  • 'B-DebtInstrumentBasisSpreadOnVariableRate1',
  • 'B-DebtInstrumentFaceAmount',
  • 'B-LineOfCreditFacilityMaximumBorrowingCapacity',
  • 'B-DebtInstrumentInterestRateStatedPercentage'

Then, it was fine-tuned to detect only the afforementioned 4 tags (plus other "O")

It achieves the following results on the evaluation set:

  • Loss: 0.0336
  • Precision: 0.9154
  • Recall: 0.9327
  • F1: 0.9240
  • Accuracy: 0.9917

Model description

Model based on distilbert/distilbert-base-uncased with all default parameters.

Intended uses & limitations

The model published here was trained for demo purposes only.

Training and evaluation data

Original train/validation/test splits from nlpaueb/finer-139, after filtering for samples containing at least one of the following NER tags:

  • 'O',
  • 'B-DebtInstrumentBasisSpreadOnVariableRate1',
  • 'B-DebtInstrumentFaceAmount',
  • 'B-LineOfCreditFacilityMaximumBorrowingCapacity',
  • 'B-DebtInstrumentInterestRateStatedPercentage'

Training procedure

Follow information here https://github.com/bodias/DistilBERT-FiNER

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
0.0354 1.0 1773 0.0375 0.8639 0.8993 0.8812 0.9870
0.0242 2.0 3546 0.0296 0.8929 0.9159 0.9042 0.9895
0.0166 3.0 5319 0.0297 0.9079 0.9208 0.9143 0.9907
0.0117 4.0 7092 0.0303 0.9101 0.9293 0.9196 0.9913
0.0086 5.0 8865 0.0328 0.9065 0.9331 0.9196 0.9913
0.0062 6.0 10638 0.0336 0.9154 0.9327 0.9240 0.9917

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
8
Safetensors
Model size
66.4M params
Tensor type
F32
·

Finetuned from

Dataset used to train bodias/distilbert-base-uncased-finetuned-FiNER