Instructions to use MarvinLvn/BabyHuBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MarvinLvn/BabyHuBERT with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("MarvinLvn/BabyHuBERT", dtype="auto") - Notebooks
- Google Colab
- Kaggle
You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
By clicking below, you agree to the BabyHuBERT License. Please read the one-pager (https://osf.io/n75rs/files/anx9v) and the full license (https://osf.io/n75rs/files/b79v4) before proceeding. The license prohibits commercial use and surveillance, requires reporting of misuse, and ensures any model built on BabyHuBERT inherits the same conditions.
Log in or Sign Up to review the conditions and access this model content.
BabyHuBERT
BabyHuBERT is a self-supervised speech representation model trained on 13,000+ hours of multilingual child-centered long-form audio recordings spanning 40+ languages — from widely-studied languages such as English and French to underrepresented languages including Yeli Dnye, Tsimane, and Quechua. It was created by the ExELang team in 2025, built on data shared by research teams around the world. We created BabyHuBERT because existing speech models trained on clean adult speech fail on child-centered recordings due to their challenging acoustic conditions: ~80% non-speech content, overlapping speakers, short vocalizations, and children's higher-pitched and more variable speech.
For a plain-language overview of what BabyHuBERT is and what you commit to by using it, see the one-pager.
License
BabyHuBERT is released under a custom license informed by an independent ethics assessment covering participants' consent, indigenous data sovereignty, privacy, and possible misuse. The license:
- Prohibits commercial use and surveillance of participants
- Requires reporting of misuse
- Ensures that any model released building on BabyHuBERT inherits the same conditions
See the full license for the full terms. All documents related to the release of BabyHuBERT can be found in this OSF repository.
Downstream models
As of April 2026, three open-source task-specific models have been built on top of BabyHuBERT:
| Model | Task |
|---|---|
| BabyHuBERT-VTC (Charlot et al., 2026) | Voice type classification (who speaks when?) |
| BabAR (Lavechin et al., 2026) | Phoneme recognition |
| Addressee classification (Charlot et al., 2026) | Child-directed speech vs adult-directed speech detection |
Downloading the checkpoint
Fill in the access form on this page to get instant access, then authenticate and download:
from huggingface_hub import login, hf_hub_download
login() # enter your HF token when prompted
ckpt_path = hf_hub_download(repo_id="MarvinLvn/BabyHuBERT", filename="BabyHuBERT.ckpt")
Extracting representations
import torch
from torchaudio.models import hubert_pretrain_base
model = hubert_pretrain_base(num_classes=500)
state_dict = torch.load(ckpt_path, map_location="cpu")
state_dict = {k.replace("model.", ""): v for k, v in state_dict["state_dict"].items()}
model.load_state_dict(state_dict)
encoder = model.wav2vec2
encoder.eval()
Citation
@misc{charlot2025babyhubertmultilingualselfsupervisedlearning,
title={BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings},
author={Théo Charlot and Tarek Kunze and Maxime Poli and Alejandrina Cristia and Emmanuel Dupoux and Marvin Lavechin},
year={2025},
eprint={2509.15001},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2509.15001},
}