Fitness YouTube Comment Classifier

Fine-tuned distilbert-base-uncased that classifies YouTube comments from fitness influencer videos into 5 categories: fitness, nutrition, motivational, challenge, product.


Quick Start

from transformers import pipeline

classifier = pipeline(
    'text-classification',
    model='Krat6s/fitness-comment-classifier'
)

classifier("This protein shake changed my life, amazing with oat milk")
# [{'label': 'nutrition', 'score': 0.956}]

classifier("Day 7 of the squat challenge complete πŸ”₯")
# [{'label': 'challenge', 'score': 0.508}]

Model Description

  • Base model: distilbert-base-uncased (66M parameters)
  • Task: Multi-class text classification (5 classes)
  • Domain: YouTube comments from fitness influencer channels
  • Language: English (non-English comments present in dataset but not handled)

Dataset

Self-scraped YouTube comments collected via the YouTube Data API v3 for MSc dissertation research on fitness influencer sentiment and thematic analysis.

  • Total dataset size: 92,223 comments
  • YouTubers: 94 fitness influencer channels
  • Top channels: Noel Deyzel, Browney, Jeff Nippard, Renaissance Periodization, ATHLEAN-X
  • HuggingFace dataset: Krat6s/fitness-youtube-comments

Class Distribution (Full Dataset)

Class Count
challenge 20,923
nutrition 20,506
fitness 19,990
motivational 19,928
product 10,749

Training

Data Splits (20,000 row stratified sample)

Split Size
Train 14,000
Validation 3,000
Test 3,000

Hyperparameters

Parameter Value
Learning rate 2e-5
Epochs 3
Batch size (train) 16
Batch size (eval) 32
Max sequence length 128
Warmup steps 50
Weight decay 0.01
Optimizer AdamW

Training Curve

Epoch Train Loss Val Loss Accuracy F1
1 2.594 2.262 0.549 0.553
2 2.037 2.150 0.581 0.583
3 1.757 2.163 0.583 0.584

Hardware: Kaggle T4 x2 GPU
Training time: 327 seconds (~5.5 minutes)


Evaluation Results (Test Set β€” 3,000 samples)

Overall

Metric Score
Accuracy 60.4%
F1 (weighted) 60.4%

Per-Class

Class Precision Recall F1 Support
challenge 0.61 0.58 0.60 685
fitness 0.64 0.62 0.63 647
motivational 0.53 0.64 0.58 641
nutrition 0.64 0.62 0.63 671
product 0.63 0.50 0.56 356

Baseline Comparison

Model Accuracy
Majority class (always predict challenge) 22.8%
Fine-tuned DistilBERT (20K rows) 60.4%
Improvement +37.6pp

Data Scaling Results

Trained two versions to measure the effect of data volume:

Model Training Data Accuracy F1
DistilBERT 5,000 rows 53.6% 53.8%
DistilBERT 20,000 rows 60.4% 60.4%

4x more data produced +6.8pp accuracy improvement with linear increase in training time (81s β†’ 327s).


Inference Examples

Comment Predicted Confidence
"This protein shake recipe changed my life, tastes amazing with oat milk" nutrition 95.6%
"I've been doing this workout for 30 days and I can see abs forming!" fitness 93.1%
"Never give up on your dreams, the grind is worth it" motivational 79.8%
"Is this pre-workout worth buying? I've heard mixed reviews" product 81.4%
"Day 7 of the squat challenge complete πŸ”₯" challenge 50.8%

Limitations

Challenge/motivational confusion is the largest source of error (136 challenge comments predicted as motivational in the 20K test set). Both classes share workout encouragement language. Without video context, the boundary is inherently ambiguous β€” this is a label ambiguity issue, not a data quantity issue. More training data did not reduce this confusion proportionally.

Product/nutrition overlap is the second largest error pattern (64 nutrition comments predicted as product). Supplement and protein content sits on the boundary between these two classes.

Non-English comments default to incorrect predictions. Approximately 15% of the full dataset contains non-English comments (Russian, German, Spanish, Korean). VADER assigned these neutral scores in the original dissertation β€” DistilBERT similarly has no signal for them.

Product class underrepresentation β€” product has roughly half the examples of other classes in the full dataset. Despite this, product achieved competitive F1 (0.56) due to distinctive brand/review vocabulary.


Next Steps

  • Fine-tune roberta-base (125M parameters) on 20K rows β€” model size comparison
  • YouTuber-stratified train/test split β€” test generalisation to unseen creators
  • Sentiment classification using human-labelled subset replacing VADER baseline

Citation

If you use this model or dataset, please credit:

Dataset: Self-scraped YouTube comments from 94 fitness influencer channels
Collected via YouTube Data API v3 for MSc dissertation research
HuggingFace: Krat6s/fitness-youtube-comments
Downloads last month
33
Safetensors
Model size
67M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Krat6s/fitness-comment-classifier-20kdp