Instructions to use Shamima/babylm-2026-multilingual-uniform-100M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Shamima/babylm-2026-multilingual-uniform-100M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Shamima/babylm-2026-multilingual-uniform-100M")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Shamima/babylm-2026-multilingual-uniform-100M") model = AutoModelForMultimodalLM.from_pretrained("Shamima/babylm-2026-multilingual-uniform-100M") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Shamima/babylm-2026-multilingual-uniform-100M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Shamima/babylm-2026-multilingual-uniform-100M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shamima/babylm-2026-multilingual-uniform-100M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Shamima/babylm-2026-multilingual-uniform-100M
- SGLang
How to use Shamima/babylm-2026-multilingual-uniform-100M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Shamima/babylm-2026-multilingual-uniform-100M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shamima/babylm-2026-multilingual-uniform-100M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Shamima/babylm-2026-multilingual-uniform-100M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shamima/babylm-2026-multilingual-uniform-100M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Shamima/babylm-2026-multilingual-uniform-100M with Docker Model Runner:
docker model run hf.co/Shamima/babylm-2026-multilingual-uniform-100M
BabyLM 2026 — MultiLingual track baseline (byte-premium-uniform)
A 110M-param Llama-style decoder pre-trained from scratch on the BabyBabelLM trilingual corpus (English, Dutch, Chinese), under the BabyLM 2026 MultiLingual track rules: 100M reference tokens, byte-premium adjusted, ≤10 epochs.
This is the baseline zero-point of our ablation grid. Subsequent runs vary the mixture allocation (loss-weighted, simultaneous-bilingual, typological-bridge curriculum, register-controlled) on top of an identical scaffold. The matching ablation paper is in preparation.
Architecture
- Llama (HF
LlamaForCausalLM) — RoPE, RMSNorm, SwiGLU, no biases, tied embeddings - 12 layers · 768 hidden · 12 heads · 2048 FFN
- 1024 sequence length
- 110,119,680 parameters
Tokenizer
Joint byte-level BPE, 32,768 vocab, trained on a balanced 50M-char sample from each of EN/NL/ZH. The same tokenizer is shared across all three languages (see the data card for why a joint tokenizer is required: ZH is 6.8% Latin script).
Training
- Data:
BabyLM-community/babylm-eng+babylm-nld+babylm-zho(BabyBabelLM 2026 100M tier). Full corpora loaded in memory and shuffled (the Hub layout is category-clustered; streaming with reasonable buffers produces a biased sample). - Mixture: byte-premium-uniform — equal share of reference tokens per language (1/3 each), achieved by deficit-driven selection, not uniform doc sampling (mean doc sizes differ across languages).
- Optimizer: AdamW (β₁=0.9, β₂=0.95, wd=0.1), lr 6e-4, cosine to 10%, 100-step warmup
- Compute: 4× NVIDIA A10G (23 GB), bf16, DDP, micro-batch 16 × grad-accum 2 (eff. batch 128 sequences = 131k tokens/step)
- Tokens consumed at this checkpoint: 100,000,000 byte-premium-adjusted reference tokens
- Per-language epochs at this checkpoint: ≈1.0 each (within the BabyLM ≤10-epoch cap)
Revisions
The chck_{N}M revisions match the BabyLM eval pipeline's fast-eval naming:
chck_1M, chck_2M, ..., chck_9M, chck_10M, chck_20M, ..., chck_90M, chck_100M
Use revision=chck_NM to load any milestone. The default (main) is chck_100M.
How to evaluate
git clone https://github.com/babylm-org/babylm-eval
cd babylm-eval/multilingual
bash scripts/zeroshot_model.sh --model_name Shamima/babylm-2026-multilingual-uniform-100M
bash scripts/zeroshot_model_fast_all.sh --model_name Shamima/babylm-2026-multilingual-uniform-100M
Citation
@misc{babylm-2026-uniform,
title = {BabyLM 2026 MultiLingual baseline (byte-premium-uniform)},
author = {Hossain, Shamima},
year = {2026},
url = {https://huggingface.co/Shamima/babylm-2026-multilingual-uniform-100M}
}
Companion repo with audit, scaffold, and ablation configs: https://github.com/silvererudite/bb-lm-challenge-sub
- Downloads last month
- 206