Edit model card

CONDITIONAL-multilabel-climatebert

This model is a fine-tuned version of climatebert/distilroberta-base-climate-f on the Policy-Classification dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5460
  • Precision-micro: 0.5020
  • Precision-samples: 0.1954
  • Precision-weighted: 0.5047
  • Recall-micro: 0.7530
  • Recall-samples: 0.1937
  • Recall-weighted: 0.7530
  • F1-micro: 0.6024
  • F1-samples: 0.1927
  • F1-weighted: 0.6033

Model description

The purpose of this model is to predict multiple labels simultaneously from a given input data. Specifically, the model will predict 2 labels - ConditionalLabel, UnconditionalLabel - that are relevant to a particular task or application

  • Conditional: In context of climate policy documents if certain Target/Action/Plan/Policy commitment is being made conditionally.
  • Unconditional: In context of climate policy documents if certain Target/Action/Plan/Policy commitment is being made unconditionally.

Intended uses & limitations

The dataset sometimes does not include the sub-heading/heading which indicates that the paragraph belongs to Conditional/Unconditional category. But has been copied from the relevant document from those sub-headings. This makes the assessment of Conditonality very difficult. Annotator when given only the paragraph without the full long context had a difficulty in assessing the conditionality of commitments being made in paragraph.

Training and evaluation data

  • Training Dataset: 5901

    Class Positive Count of Class
    ConditionalLabel 1986
    UnconditionalLabel 1312
  • Validation Dataset: 1190

    Class Positive Count of Class
    ConditionalLabel 192
    UnconditionalLabel 136

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 6.03e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 300
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Precision-micro Precision-samples Precision-weighted Recall-micro Recall-samples Recall-weighted F1-micro F1-samples F1-weighted
0.5644 1.0 369 0.4161 0.3642 0.1391 0.4167 0.5640 0.1416 0.5640 0.4426 0.1389 0.4372
0.429 2.0 738 0.3616 0.4420 0.1803 0.4794 0.6860 0.1769 0.6860 0.5376 0.1768 0.5473
0.2657 3.0 1107 0.4233 0.4126 0.1950 0.4229 0.7774 0.1987 0.7774 0.5391 0.1944 0.5418
0.1482 4.0 1476 0.4301 0.4910 0.1891 0.4944 0.7470 0.1908 0.7470 0.5925 0.1882 0.5924
0.069 5.0 1845 0.5016 0.5126 0.1920 0.5193 0.7439 0.1912 0.7439 0.6070 0.1899 0.6090
0.0353 6.0 2214 0.5460 0.5020 0.1954 0.5047 0.7530 0.1937 0.7530 0.6024 0.1927 0.6033
label precision recall f1-score support
ConditionalLabel 0.477 0.765 0.588 192.0
UnconditionalLabel 0.543 0.735 0.625 136.0

Environmental Impact

Carbon emissions were measured using CodeCarbon.

  • Carbon Emitted: 0.01733 kg of CO2
  • Hours Used: 0.383 hours

Training Hardware

  • On Cloud: yes
  • GPU Model: 1 x Tesla T4
  • CPU Model: Intel(R) Xeon(R) CPU @ 2.00GHz
  • RAM Size: 12.67 GB

Framework versions

  • Transformers 4.38.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
4
Safetensors
Model size
82.3M params
Tensor type
F32
·

Finetuned from

Dataset used to train GIZ/CONDITIONAL-multilabel-climatebert_f

Collection including GIZ/CONDITIONAL-multilabel-climatebert_f