distilbert_classifier_newsgroups

This model is a fine-tuned version of distilbert-base-uncased on 20Newsgroups dataset. It achieves the following results on the evaluation set:

Model description

We have fine-tuned the distilbert-base-uncased to classify news in 20 main topics based on the labeled dataset 20Newsgroups.

Training and evaluation data

The 20 newsgroups dataset comprises around 18000 newsgroups posts on 20 topics split in two subsets: one for training (or development) and the other one for testing (or for performance evaluation). The split between the train and test set is based upon a messages posted before and after a specific date.

These are the 20 topics we fine-tuned the model on:

'alt.atheism', 'comp.graphics', 'comp.os.ms-windows.misc', 'comp.sys.ibm.pc.hardware', 'comp.sys.mac.hardware', 'comp.windows.x', 'misc.forsale', 'rec.autos', 'rec.motorcycles', 'rec.sport.baseball', 'rec.sport.hockey', 'sci.crypt', 'sci.electronics', 'sci.med', 'sci.space', 'soc.religion.christian', 'talk.politics.guns', 'talk.politics.mideast', 'talk.politics.misc', 'talk.religion.misc'

Training hyperparameters

The following hyperparameters were used during training:

optimizer: {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 1908, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False}
training_precision: float32

Training results

Epoch 1/3 637/637 [==============================] - 110s 131ms/step - loss: 1.3480 - accuracy: 0.6633 - val_loss: 0.6122 - val_accuracy: 0.8304 Epoch 2/3 637/637 [==============================] - 44s 70ms/step - loss: 0.4498 - accuracy: 0.8812 - val_loss: 0.4342 - val_accuracy: 0.8799 Epoch 3/3 637/637 [==============================] - 40s 64ms/step - loss: 0.2685 - accuracy: 0.9355 - val_loss: 0.3756 - val_accuracy: 0.8993 CPU times: user 3min 4s, sys: 8.76 s, total: 3min 13s Wall time: 3min 15s <keras.callbacks.History at 0x7f481afbfbb0>

Framework versions

Transformers 4.28.0
TensorFlow 2.12.0
Datasets 2.12.0
Tokenizers 0.13.3