DistilBERT for Goodreads Genre Classification

Model Description

This model is a fine-tuned version of distilbert-base-cased designed to classify book reviews into specific genres. It was developed as part of an MLOps pipeline demonstrating end-to-end model fine-tuning, evaluation, and deployment using Hugging Face and Weights & Biases.

Intended Use

This model takes a text string (a book review) and predicts which of the 8 predefined genres it belongs to. It is intended for educational purposes and text classification pipeline demonstrations.

Training Data

The model was fine-tuned on a sampled subset of the UCSD Goodreads Reviews Dataset. The data consists of user-generated book reviews mapped to the following 8 genres:

Poetry
Children
Comics & Graphic Novels
Fantasy & Paranormal
History & Biography
Mystery, Thriller & Crime
Romance
Young Adult

Training Procedure

The model was trained using the Hugging Face Trainer API with the following configuration:

Epochs: 3
Batch Size: 16 (per device)
Max Sequence Length: 512 tokens
Hardware: Dual NVIDIA T4 GPUs (Kaggle)
Experiment Tracking: Weights & Biases (W&B)

Evaluation Results

On the held-out test set, the model achieved an approximate Accuracy of 58% and a Weighted F1-Score of 58%. While not optimized for state-of-the-art accuracy, it successfully demonstrates the ability to learn and differentiate stylistic and contextual patterns across diverse genre vocabularies.

How to Use This Model

You can use this model directly in your Python applications with the Hugging Face pipeline:

from transformers import pipeline

# Load the pipeline
classifier = pipeline("text-classification", model="zeeshan-hf/distilbert-goodreads-genres")

# Test it with a review
review = "The magic system in this book was incredible, and the dragons felt so real!"
prediction = classifier(review)

print(prediction)
# Expected output: [{'label': 'fantasy_paranormal', 'score': 0.85...}]

Downloads last month: 4

Safetensors

Model size

65.8M params

Tensor type

F32