IKT_classifier_mitigation_best

This model is a fine-tuned version of sentence-transformers/all-mpnet-base-v2 on the GIZ/policy_qa_v0_1 dataset. It achieves the following results on the evaluation set:

Loss: 0.6517
Precision Micro: 0.3667
Precision Weighted: 0.4273
Precision Samples: 0.4539
Recall Micro: 0.7543
Recall Weighted: 0.7543
Recall Samples: 0.7982
F1-score: 0.5422
Accuracy: 0.1654

Model description

The model is a multi-label text classifier based on sentence-transformers/all-mpnet-base-v2 and fine-tuned on text sourced from national climate policy documents.

Intended uses & limitations

The classifier assigns the following classes to to denote Mitigation categories as portrayed in extracted passages from the documents. The Mitigation categories are based on a taxonomy defined by the TraCS Climate Strategies for Transport (implemented by GIZ and funded by the International Climate Initiative (IKI) of the German Federal Ministry for Economic Affairs and Climate Action (BMWK)):

index	Category
0	Active mobility
1	Alternative fuels
2	Aviation improvements
3	Comprehensive transport planning
4	Digital solutions
5	Economic instruments
6	Education and behavioral change
7	Electric mobility
8	Freight efficiency improvements
9	Improve infrastructure
10	Labels
11	Land use
12	Public transport improvement
13	Shipping improvements
14	Transport demand management
15	Vehicle improvements

The intended use is for climate policy researchers and analysts seeking to automate the process of reviewing lengthy, non-standardized PDF documents to produce summaries and reports.

Due to inconsistencies in the training data, the classifier performance leaves room for improvement. The classifier exhibits reasonable multi-class training metrics (F1 ~ 0.5), with low precision in the identification of true positive classifications (precision ~ 0.4), but a wide net to capture as many true positives as possible (recall ~ 0.75). When tested on real world unseen test data, the performance was similar to training validation (F1 ~ 0.5). However, testing was based on a small out-of-sample dataset containing it's own inconsistencies. Therefore classification may prove better or worse in practice.

Training and evaluation data

The training dataset is comprised of labelled passages from 2 sources:

ClimateWatch NDC Sector data. Here we utilized the QA dataset (CW_NDC_data_Sector).
IKI TraCS Climate Strategies for Transport Tracker implemented by GIZ and funded by the International Climate Initiative (IKI) of the German Federal Ministry for Economic Affairs and Climate Action (BMWK).

The combined datasetGIZ/policy_qa_v0_1 contains ~85k rows. Each row is duplicated twice, to provide varying sequence lengths (denoted by the values 'small', 'medium', and 'large', which correspond to sequence lengths of 60, 85, and 150 respectively - indicated in the 'strategy' column). This effectively means the dataset is reduced by 1/3 in useful size, and the 'strategy' value should be selected based on the use case. For this training, we utilized the 'medium' samples, from the IKITracs data only. Furthermore, for each row, the 'context' column contains 3 samples of varying quality. The approach used to assess quality and select samples is described below.

The pre-processing operations used to produce the final training dataset were as follows:

Dataset is filtered based on 'medium' value in 'strategy' column (sequence length = 85), selecting only IKITracs samples.
For ClimateWatch, all rows are removed as there was assessed to be no taxonomical alignment with the IKITracs labels inherent to the dataset.
For IKITracs, labels are assigned based on the presence of of 'parameter' values matching the category mapping taxonomy defined by TraCS (ref. below)
If 'context_translated' is available and the 'language' is not English, 'context' is replaced with 'context_translated'. This results in the model being trained on English translations of original text samples.
The dataset is "exploded" - i.e., the text samples in the 'context' column, which are lists, are converted into separate rows - and labels are merged to align with the associated samples.
The 'match_onanswer' and 'answerWordcount' are used conditionally to select high quality samples (prefers high % of word matches in 'match_onanswer', but will take lower if there is a high 'answerWordcount')
Data is then augmented using sentence shuffle from the albumentations library and insertions from nlpaug. This is done to increase the number of training samples available for under-represented classes. Given the large number of classes for this classifier, it is unsurprising that some categories have very low representation. In this case, classes with instances less than 1/3 of the most represented classes are categorized as under-represented and each instance is augmented to effectively double the number of instances for these classes.
To address the remaining class imbalances, the ratio of negative instances to positive instances for each class is computed to produce a weights array. This array is passed to a custom multi label trainer function which is used during hyperparameter tuning and final model training.

###Parameter to category mapping taxonomy

index	Category	Parameter
0	Active mobility	S_Activemobility , S_Cycling , S_Walking
1	Alternative fuels	I_Altfuels , I_Biofuel , I_Ethanol , I_Hydrogen , I_LPGCNGLNG , I_RE
2	Aviation improvements	I_Aircraftfleet , I_Airtraffic , I_Aviation , I_Capacityairport , I_CO2certificate , I_Jetfuel
3	Comprehensive transport planning	A_Complan , A_LATM , A_Natmobplan , A_SUMP
4	Digital solutions	I_Autonomous , I_DataModelling , I_ITS , I_Other , S_Maas , S_Ondemand , S_Sharedmob
5	Economic instruments	A_Economic , A_Emistrad , A_Finance , A_Fossilfuelsubs , A_Fueltax , A_Procurement , A_Roadcharging , A_Vehicletax
6	Education and behavioral change	I_Campaigns , I_Capacity , I_Ecodriving , I_Education
7	Electric mobility	I_Emobility , I_Emobilitycharging , I_Emobilitypurchase , I_ICEdiesel , I_Smartcharging , S_Micromobility
8	Freight efficiency improvements	I_Freighteff , I_Load , S_Railfreight
9	Improve infrastructure	S_Infraexpansion , S_Infraimprove , S_Intermodality
10	Labels	I_Efficiencylabel , I_Freightlabel , I_Fuellabel , I_Transportlabel , I_Vehiclelabel
11	Land use	A_Density , A_Landuse , A_Mixuse
12	Public transport improvement	S_BRT , S_PTIntegration , S_PTPriority , S_PublicTransport
13	Shipping improvements	I_Onshorepower , I_PortInfra , I_Shipefficiency , I_Shipping
14	Transport demand management	A_Caraccess , A_Commute , A_Parkingprice , A_TDM , A_Teleworking , A_Work , S_Parking
15	Vehicle improvements	A_LEZ , I_Efficiencystd , I_Fuelqualimprove , I_Inspection , I_Lowemissionincentive , I_Vehicleeff , I_Vehicleimprove , I_VehicleRestrictions , I_Vehiclescrappage

Training procedure

The model hyperparameters were tuned using optuna over 10 trials on a truncated training and validation dataset. The model was then trained over 5 epochs using the best hyperparameters identified.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3.6181464293180716e-05
train_batch_size: 3
eval_batch_size: 3
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 300.0
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Precision Micro	Precision Weighted	Precision Samples	Recall Micro	Recall Weighted	Recall Samples	F1-score	Accuracy
No log	1.0	398	1.0635	0.1718	0.2238	0.1763	0.7714	0.7714	0.7945	0.2794	0.0
1.2442	2.0	796	0.8827	0.2167	0.2522	0.2388	0.7543	0.7543	0.7863	0.3518	0.0
0.9539	3.0	1194	0.7579	0.2710	0.3279	0.2979	0.7543	0.7543	0.7932	0.4134	0.0150
0.8265	4.0	1592	0.6773	0.3377	0.3943	0.3937	0.7429	0.7429	0.7901	0.4961	0.0752
0.8265	5.0	1990	0.6517	0.3667	0.4273	0.4539	0.7543	0.7543	0.7982	0.5422	0.1654

Framework versions

Transformers 4.31.0
Pytorch 2.0.1+cu118
Datasets 2.13.1
Tokenizers 0.13.3

mtyrrell
/

CPU_Mitigation_Classifier