🍼 Baby Cry Analyzer

A machine learning model that classifies baby crying sounds into 11 categories to help parents understand what their baby needs.

πŸ“Š Model Performance

Metric Score
Accuracy 86.38%
F1 Score 85.43%
Precision 86.07%
ROC AUC 0.9864

🏷️ Classes (11 cry types)

Label Emoji Meaning
belly pain 🀒 Baby has stomach pain
burping πŸ’¨ Baby needs to burp
cold_hot 🌑️ Baby is too cold or hot
discomfort 😣 Baby is uncomfortable
hungry 🍼 Baby is hungry
laugh πŸ˜„ Baby is happy/laughing
lonely πŸ₯Ί Baby wants attention
noise πŸ”Š Background noise detected
scared 😨 Baby is startled/scared
silence 🀫 No crying detected
tired 😴 Baby is sleepy

🧠 Model Architecture

  • Type: Tuned Stacking Ensemble
  • Base learners: SVM + KNN + ExtraTrees + XGBoost + MLP
  • Meta learner: SVM (RBF kernel)
  • Features: MFCC (40) + Chroma (12) + Mel Spectrogram (128) = 180 features
  • Preprocessing: StandardScaler + SMOTE balancing

πŸ“ Dataset

  • Source: Baby Cry Pattern Archive
  • Total samples: 1,450 audio files (after SMOTE: 4,367)
  • Format: WAV, OGG, 3GP audio files

πŸš€ How to Use

import joblib
import librosa
import numpy as np
from huggingface_hub import hf_hub_download

model  = joblib.load(hf_hub_download("Nerdy37/baby-cry-analyzer", "best_model.pkl"))
scaler = joblib.load(hf_hub_download("Nerdy37/baby-cry-analyzer", "scaler.pkl"))

label_names = [
    'belly pain', 'burping', 'cold_hot', 'discomfort',
    'hungry', 'laugh', 'lonely', 'noise', 'scared', 'silence', 'tired'
]

def predict(audio_path):
    audio, sr = librosa.load(audio_path, sr=22050, duration=5, mono=True)
    mfcc   = np.mean(librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=40).T, axis=0)
    chroma = np.mean(librosa.feature.chroma_stft(y=audio, sr=sr).T, axis=0)
    mel    = np.mean(librosa.feature.melspectrogram(y=audio, sr=sr).T, axis=0)
    features = scaler.transform([np.hstack([mfcc, chroma, mel])])
    pred = model.predict(features)[0]
    prob = model.predict_proba(features)[0]
    return label_names[pred], f"{max(prob)*100:.1f}%"

label, confidence = predict("baby_cry.wav")
print(f"Prediction: {label} ({confidence})")

⚠️ Limitations

  • hungry class has lower accuracy (28.7%) due to acoustic similarity with discomfort, burping, and cold_hot
  • Best used as a support tool, not a medical diagnosis
  • Works best on clean audio with minimal background noise

πŸ‘€ Author

Made by Nerdy37 as part of an end-to-end ML pipeline project.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using Nerdy37/baby-cry-analyzer 1

Evaluation results