Edit model card

Classification of patent title - "green" or "no green"

This model classifies patents into "green patents" or "no green patents" by their titles.

Examples of "green patents" titles:

  • "A method for recycling waste" - score: 0.714
  • "A method of reducing pollution" - score: 0.786
  • "An apparatus to improve environmental aspects" - score: 0.570
  • "A method to improve waste management" - score: 0.813
  • "A device to use renewable energy sources" - score: 0.98
  • "A technology for efficient electrical power generation"- score: 0.975
  • "A method for the production of fuel of non-fossil origin" - score: 0.975
  • "Biofuels from waste" - score: 0.88
  • "A combustion technology with mitigation potential" - score: 0.947
  • "A device to capture greenhouse gases" - score: 0.871
  • "A method to reduce the greenhouse effect" - score: 0.887
  • "A device to improve the climate" - score: 0.650
  • "A device to stop climate change" - score: 0.55

Examples of "no green patents" titles:

  • "A device to destroy the nature" - score: 0.19
  • "A method to produce smoke" - score: 0.386

Examples of the model's limitation

  • "A method to avoid trash" - score: 0.165
  • "A method to reduce trash" - score: 0.333
  • "A method to burn the Amazonas" - score: 0.501
  • "A method to burn wood" - score: 0.408
  • "Green plastics" - score: 0.126
  • "Greta Thunberg" - score: 0.313 (How dare you, model?); BUT: "A method of using Greta Thunberg to stop climate change" - score: 0.715

Examples were inspired by https://www.epo.org/news-events/in-focus/classification/classification.html

distilbert-base-uncased-finetuned-greenpatent

This model is a fine-tuned version of distilbert-base-uncased on the green patent dataset. The green patent dataset was split into 70 % training data and 30 % test data (using ".train_test_split(test_size=0.3)"). The model achieves the following results on the evaluation set:

  • Loss: 0.3148
  • Accuracy: 0.8776
  • F1: 0.8770

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 64
  • eval_batch_size: 64
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
0.4342 1.0 101 0.3256 0.8721 0.8712
0.3229 2.0 202 0.3148 0.8776 0.8770

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.1+cpu
  • Datasets 2.8.0
  • Tokenizers 0.13.2
Downloads last month
0

Dataset used to train cwinkler/distilbert-base-uncased-finetuned-greenpatent