YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
distilbert-goodreads-genres_v2
This model is a fine-tuned version of distilbert-base-cased on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.8567
library_name: transformers tags: - text-classification - distilbert - goodreads datasets: - ucsd_goodreads metrics: - accuracy - f1 - loss
distilbert-goodreads-genres_v2
Model Details
- Developed by: Duggirala Vnaga Ananth (G25AIT2032)
- Institution: IIT Jodhpur | PGD AI Programme
- Model type: Transformer-based Text Classification
- Language(s): English
- Finetuned from model: distilbert-base-cased
MLOps Pipeline Links
- GitHub Repository: g25ait2032-prog/nagaananth
- Experiment Tracking (W&B): View Final Run & Artifacts
- Hugging Face Model Hub: nagaananth/distilbert-goodreads-genres_v2
Model Description
This model is a fine-tuned version of distilbert-base-cased designed to classify book reviews into seven distinct genres: Poetry, Comics & Graphic, Fantasy & Paranormal, History & Biography, Mystery/Thriller/Crime, Romance, and Young Adult.
This v2 iteration focused on testing model limits via extended training epochs (10) and Bayesian-inspired hyperparameter adjustments to explore the trade-off between training convergence and validation generalization.
Intended Uses & Limitations
Intended Use
- Automated categorization of literary reviews.
- Baseline for genre-specific sentiment or thematic analysis.
Limitations & Observations (MLOps Critical Analysis)
- Significant Overfitting: As per the training logs, the training loss reached a near-perfect 0.1098, while validation loss increased to 4.8567. This indicates the model has memorized the training set.
- Model Rewind: To ensure the most usable version was deployed, the
load_best_model_at_endflag was used. The final weights represent the state at Epoch 2 (Validation Loss: 2.1895). - Genre Bias: Inference testing reveals a bias toward the "Romance" and "Poetry" labels for ambiguous text, likely due to linguistic overlaps in the 800-sample balanced training sets.
Training Procedure
Training Data
- Dataset: UCSD Goodreads Book Graph.
- Size: 5,600 training samples (800 per genre, perfectly balanced).
- Validation: 1,400 samples (200 per genre).
Hyperparameters
- Learning Rate: 5e-05
- Batch Size: 16 (Train/Eval)
- Optimizer: AdamW (Fused)
- Epochs: 10
- Weight Decay: 0.01
Training Results (v2)
| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 1 | 350 | No log | 2.1436 |
| 2 | 700 | 2.4290 | 2.1895 (Best) |
| 5 | 1750 | 0.6660 | 3.4322 |
| 10 | 3500 | 0.1098 | 4.8567 |
Environmental Impact
- Hardware: NVIDIA T4 Tensor Core GPU
- Compute Provider: Google Cloud Platform (via Kaggle/Colab)
- Carbon Emitted: < 0.01 kg CO2eq (Estimated using MLCO2 Impact Tracker)
Technical Specifications
- Frameworks: Transformers 5.0.0, PyTorch 2.10.0+cu128, Datasets 4.8.3
- Infrastructure: Modularized Python scripts (
data.py,train.py) with integratedwandblogging andhuggingface_hubsyncing.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 1.0 | 350 | 2.1436 |
| 2.4290 | 2.0 | 700 | 2.1895 |
| 1.3851 | 3.0 | 1050 | 2.2901 |
| 1.3851 | 4.0 | 1400 | 2.9491 |
| 0.6660 | 5.0 | 1750 | 3.4322 |
| 0.3433 | 6.0 | 2100 | 4.3519 |
| 0.3433 | 7.0 | 2450 | 4.5286 |
| 0.2059 | 8.0 | 2800 | 4.7578 |
| 0.1519 | 9.0 | 3150 | 4.8699 |
| 0.1098 | 10.0 | 3500 | 4.8567 |
Framework versions
- Transformers 5.0.0
- Pytorch 2.10.0+cu128
- Datasets 4.8.3
- Tokenizers 0.22.2
Citation
@article{sanh2019distilbert,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Sanh, Victor and Debut, Lysandre and Chaumond, Adrien and Wolf, Thomas},
journal={arXiv preprint arXiv:1910.01108},
year={2019}
}
- Downloads last month
- 292