Classification of patent title - "green" or "no green"

This model classifies patents into "green patents" or "no green patents" by their titles.

Examples of "green patents" titles:

"A method for recycling waste" - score: 0.714
"A method of reducing pollution" - score: 0.786
"An apparatus to improve environmental aspects" - score: 0.570
"A method to improve waste management" - score: 0.813
"A device to use renewable energy sources" - score: 0.98
"A technology for efficient electrical power generation"- score: 0.975
"A method for the production of fuel of non-fossil origin" - score: 0.975
"Biofuels from waste" - score: 0.88
"A combustion technology with mitigation potential" - score: 0.947
"A device to capture greenhouse gases" - score: 0.871
"A method to reduce the greenhouse effect" - score: 0.887
"A device to improve the climate" - score: 0.650
"A device to stop climate change" - score: 0.55

Examples of "no green patents" titles:

"A device to destroy the nature" - score: 0.19
"A method to produce smoke" - score: 0.386

Examples of the model's limitation

"A method to avoid trash" - score: 0.165
"A method to reduce trash" - score: 0.333
"A method to burn the Amazonas" - score: 0.501
"A method to burn wood" - score: 0.408
"Green plastics" - score: 0.126
"Greta Thunberg" - score: 0.313 (How dare you, model?); BUT: "A method of using Greta Thunberg to stop climate change" - score: 0.715

Examples were inspired by https://www.epo.org/news-events/in-focus/classification/classification.html

distilbert-base-uncased-finetuned-greenpatent

This model is a fine-tuned version of distilbert-base-uncased on the green patent dataset. The green patent dataset was split into 70 % training data and 30 % test data (using ".train_test_split(test_size=0.3)"). The model achieves the following results on the evaluation set:

Loss: 0.3148
Accuracy: 0.8776
F1: 0.8770

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
0.4342	1.0	101	0.3256	0.8721	0.8712
0.3229	2.0	202	0.3148	0.8776	0.8770

Framework versions

Transformers 4.25.1
Pytorch 1.13.1+cpu
Datasets 2.8.0
Tokenizers 0.13.2

cwinkler
/

distilbert-base-uncased-finetuned-greenpatent