license: mit
language:
- en
metrics:
- accuracy
- mse
- f1
base_model:
- dmis-lab/biobert-base-cased-v1.2
- google-bert/bert-base-cased
pipeline_tag: text-classification
model-index:
- name: bert-causation-rating-dr1
results:
- task:
type: text-classification
dataset:
name: rating_dr1
type: dataset
metrics:
- name: off by 1 accuracy
type: accuracy
value: 88.13559322033898
- name: mean squared error for ordinal data
type: mse
value: 0.11864406779661017
- name: weighted F1 score
type: f1
value: 0.8787637088733798
- name: Kendall's tau coefficient
type: Kendall's tau
value: 0.922792113501029
source:
name: Keling Wang
url: https://github.com/Keling-Wang
datasets:
- kelingwang/causation_strength_rating
Model description
This bert-causation-rating-dr1
model is a fine-tuned biobert-base-cased-v1.2 model on a small set of manually annotated texts with causation labels. This model is tasked with classifying a sentence into different levels of strength of causation expressed in this sentence.
The sentences in the dataset were rated independently by two researchers. This dr1
version is tuned on the set of sentences with labels rated by Rater 1.
Intended use and limitations
This model is primarily used to rate for the strength of expressed causation in a sentence extracted from a clinical guideline in the field of diabetes mellitus management. This model predicts strength of causation (SoC) labels based on the text inputs as:
- -1: No correlation or variable relationships mentioned in the sentence.
- 0: There is correlational relationships but not causation in the sentence.
- 1: The sentence expresses weak causation.
- 2: The sentence expresses moderate causation.
- 3: The sentence expresses strong causation.
NOTE: The model output is five one-hot logits and will be 0-index based, and the labels will be 0 to 4. It is good to use this
python
module if one wants to make predictions.
Performance and hyperparameters
Test metrics
This model achieves the following results on the test dataset. The test dataset is a 25% held-out split of the entire dataset with SEED=114514
.
- Loss: 0.5916
- Off-by-1 accuracy: 88.1356
- Off-by-2 accuracy: 100.0000
- MSE for ordinal data: 0.1186
- Weighted F1: 0.8788
- Kendall's Tau: 0.9228
This performance is achieved with the following hyperparameters:
- Learning rate: 7.94278e-05
- Weight decay: 0.111616
- Warmup ratio: 0.301057
- Power of polynomial learning rate scheduler: 2.619975
- Power to the distance measure used in the loss function \alpha: 2.0
Hyperparameter tuning metrics
During the Bayesian optimization procedure for hyperparameter tuning, this model achieves the best target metric (Off-by-1 accuracy) of 99.1147, as the result from 4-fold cross-validation procedure based on best hyperparameters.
Training settings
The following training configurations apply:
seed
: 114514batch_size
: 128epoch
: 8max_length
intorch.utils.data.Dataset
: 128- Loss function: the OLL loss with a tunable hyperparameter \alpha (Power to the distance measure used in the loss function).
lr
: 7.94278e-05weight_decay
: 0.111616warmup_ratio
: 0.301057lr_scheduler_type
: polynomiallr_scheduler_kwargs
:{"power": 2.619975, "lr_end": 1e-8}
- Power to the distance measure used in the loss function \alpha: 2.0
Framework versions and devices
This model is run on a NVIDIA P100 CPU provided by Kaggle. Framework versions are:
- python==3.10.14
- cuda==12.4
- NVIDIA-SMI==550.90.07
- torch=2.4.0
- transformers==4.45.1
- scikit-learn==1.2.2
- optuna==4.0.0
- nlpaug==1.1.11