--- license: apache-2.0 base_model: sentence-transformers/all-mpnet-base-v2 tags: - generated_from_trainer metrics: - accuracy model-index: - name: IKT_classifier_conditional_best results: [] widget: - text: "Brick Kilns. Enforcement and Improved technology use. Residential and Commercial. Enhanced use of energy- efficient appliances in household and commercial buildings. F-Gases. Implement Montreal Protocol targets. Industry. Achieve 10% Energy efficiency in the Industry sub-sector through measures according to the Energy Efficiency and Conservation Master Plan (EECMP). Agriculture. Implementation of 5925 Nos. solar irrigation pumps (generating 176.38MW) for agriculture. Brick Kilns. 14% emission reduction through Banning Fixed Chimney kiln (FCK), encourage advanced technology and non-fired brick use. Residential and Commercial." example_title: UNCONDITIONAL - text: "Achieve 20% Energy efficiency in the Industry sub-sector through measures according to the Energy Efficiency and Conservation Master Plan (EECMP). Promote green Industry. Promote carbon financing. Agriculture. Enhanced use of solar energy in Agriculture. Agriculture. Implementation of 4102 Nos. solar irrigation pumps (generating 164 MW) for agriculture. Brick Kilns. Enforcement and Improved technology use. Brick Kilns. 47% emission reduction through Banning Fixed Chimney kiln (FCK), encourage advanced technology and non-fired brick use. Residential and Commercial." example_title: CONDITIONAL - text: "The GHG emission reductions from Cairo metro network includes the rehabilitation of existing lines 1, 2, and 3. • The development of Alexandria Metro (Abu Qir – Alexandria railway line) and rehabilitation of the Raml tram line." example_title: CONDITIONAL --- # IKT_classifier_conditional_best This model is a fine-tuned version of [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) on the [GIZ/policy_qa_v0_1](https://huggingface.co/datasets/GIZ/policy_qa_v0_1) dataset. It achieves the following results on the evaluation set: - Loss: 0.5371 - Precision Macro: 0.8714 - Precision Weighted: 0.8713 - Recall Macro: 0.8711 - Recall Weighted: 0.8712 - F1-score: 0.8712 - Accuracy: 0.8712 ## Model description The model is a binary text classifier based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) and fine-tuned on text sourced from national climate policy documents. ## Intended uses & limitations The classifier assigns a class of **'Unconditional' or 'Conditional' to denote the strength of commitments** as portrayed in extracted passages from the documents. The intended use is for climate policy researchers and analysts seeking to automate the process of reviewing lengthy, non-standardized PDF documents to produce summaries and reports. Due to inconsistencies in the training data, the classifier performance leaves room for improvement. The classifier exhibits reasonably good training metrics (F1 ~ 0.85), balanced between precise identification of true positive classifications (precision ~ 0.85) and a wide net to capture as many true positives as possible (recall ~ 0.85). When tested on real world unseen test data, the performance was subptimal for a binary classifier (F1 ~ 0.5). However, testing was based on a small out-of-sample dataset containing it's own inconsistencies. Therefore classification may prove more robust in practice. ## Training and evaluation data The training dataset is comprised of labelled passages from 2 sources: - [ClimateWatch NDC Sector data](https://www.climatewatchdata.org/data-explorer/historical-emissions?historical-emissions-data-sources=climate-watch&historical-emissions-gases=all-ghg&historical-emissions-regions=All%20Selected&historical-emissions-sectors=total-including-lucf%2Ctotal-including-lucf&page=1). - [IKI TraCS Climate Strategies for Transport Tracker](https://changing-transport.org/wp-content/uploads/20220722_Tracker_Database.xlsx) implemented by GIZ and funded by theInternational Climate Initiative (IKI) of the German Federal Ministry for Economic Affairs and Climate Action (BMWK). Here we utilized the QA dataset (CW_NDC_data_Sector). The combined dataset[GIZ/policy_qa_v0_1](https://huggingface.co/datasets/GIZ/policy_qa_v0_1) contains ~85k rows. Each row is duplicated twice, to provide varying sequence lengths (denoted by the values 'small', 'medium', and 'large', which correspond to sequence lengths of 60, 85, and 150 respectively - indicated in the 'strategy' column). This effectively means the dataset is reduced by 1/3 in useful size, and the 'strategy' value should be selected based on the use case. For this training, we utilized the 'medium' samples Furthermore, for each row, the 'context' column contains 3 samples of varying quality. The approach used to assess quality and select samples is described below. The pre-processing operations used to produce the final training dataset were as follows: 1. Dataset is filtered based on 'medium' value in 'strategy' column (sequence length = 85). 2. For IKITracs, labels are assigned based on the presence of certain substrings ('_unc' or '_c') based on 'parameter' values which correspond to assessments of 'unconditional' or 'conditional' commitments by human annotaters. 3. For ClimateWatch, the 'QuestionText' field is searched for the terms 'unconditional' or 'conditional', and labels assigned accordingly. 4. If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'. This results in the model being trained on English translations of original text samples. 5. The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples. 6. The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount') 7. Data is then augmented using sentence shuffle from the ```albumentations``` library (NLP methods insertion and substitution were also tried, but lowered the performance of the model and were therefore not included in the final training data). This is done to increase the number of training samples available for the Unconditional class from 774 to 1163. The end result is an equal sample per class breakdown of: > - UNCONDITIONAL: 1163 > - CONDITIONAL: 1163 ## Training procedure The model hyperparameters were tuned using ```optuna``` over 10 trials on a truncated training and validation dataset. The model was then trained over 5 epochs using the best hyperparameters identified. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 4.112924307850544e-05 - train_batch_size: 3 - eval_batch_size: 3 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 400.0 - num_epochs: 5 ### Training results | Training Loss | Epoch | Step | Validation Loss | Precision Macro | Precision Weighted | Recall Macro | Recall Weighted | F1-score | Accuracy | |:-------------:|:-----:|:----:|:---------------:|:---------------:|:------------------:|:------------:|:---------------:|:--------:|:--------:| | 0.6658 | 1.0 | 698 | 0.7196 | 0.7391 | 0.7381 | 0.7102 | 0.7124 | 0.7028 | 0.7124 | | 0.6301 | 2.0 | 1396 | 0.4965 | 0.8073 | 0.8075 | 0.8071 | 0.8069 | 0.8069 | 0.8069 | | 0.5252 | 3.0 | 2094 | 0.5307 | 0.8300 | 0.8297 | 0.8279 | 0.8283 | 0.8279 | 0.8283 | | 0.3513 | 4.0 | 2792 | 0.5261 | 0.8626 | 0.8627 | 0.8626 | 0.8627 | 0.8626 | 0.8627 | | 0.2979 | 5.0 | 3490 | 0.5371 | 0.8714 | 0.8713 | 0.8711 | 0.8712 | 0.8712 | 0.8712 | ### Framework versions - Transformers 4.31.0 - Pytorch 2.0.1+cu118 - Datasets 2.13.1 - Tokenizers 0.13.3