DistilBERT for Goodreads Genre Classification

Model Description

This model is a fine-tuned version of distilbert-base-cased designed to classify book reviews into specific genres. It was developed as part of an MLOps pipeline demonstrating end-to-end model fine-tuning, evaluation, and deployment using Hugging Face and Weights & Biases.

Intended Use

This model takes a text string (a book review) and predicts which of the 8 predefined genres it belongs to. It is intended for educational purposes and text classification pipeline demonstrations.

Training Data

The model was fine-tuned on a sampled subset of the UCSD Goodreads Reviews Dataset. The data consists of user-generated book reviews mapped to the following 8 genres:

  • Poetry
  • Children
  • Comics & Graphic Novels
  • Fantasy & Paranormal
  • History & Biography
  • Mystery, Thriller & Crime
  • Romance
  • Young Adult

Training Procedure

The model was trained using the Hugging Face Trainer API with the following configuration:

  • Epochs: 3
  • Batch Size: 16 (per device)
  • Max Sequence Length: 512 tokens
  • Hardware: Dual NVIDIA T4 GPUs (Kaggle)
  • Experiment Tracking: Weights & Biases (W&B)

Evaluation Results

On the held-out test set, the model achieved an approximate Accuracy of 58% and a Weighted F1-Score of 58%. While not optimized for state-of-the-art accuracy, it successfully demonstrates the ability to learn and differentiate stylistic and contextual patterns across diverse genre vocabularies.

How to Use This Model

You can use this model directly in your Python applications with the Hugging Face pipeline:

from transformers import pipeline

# Load the pipeline
classifier = pipeline("text-classification", model="zeeshan-hf/distilbert-goodreads-genres")

# Test it with a review
review = "The magic system in this book was incredible, and the dragons felt so real!"
prediction = classifier(review)

print(prediction)
# Expected output: [{'label': 'fantasy_paranormal', 'score': 0.85...}]
Downloads last month
4
Safetensors
Model size
65.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support