PSALM ELC-PSALM-S — arm D (Paribhāṣā dose)

Small bidirectional ELC-BERT-style encoder trained from scratch under the BabyLM Strict-Small protocol. This is ablation arm D: the stage-one structural dose is Paribhāṣā, trimmed to the same token budget as every other arm over a shared English base, so differences between arms are attributable to dose content under a fixed budget rather than to data volume.

Trained jointly with masked and causal objectives; minimal pairs are scored by Salazar-style pseudo-log-likelihood. The export registers both AutoModel (base encoder, returns last_hidden_state) and AutoModelForMaskedLM, so the official BabyLM (Super)GLUE fine-tuner can load it directly.

from transformers import AutoModelForMaskedLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("qbz506/psalm-arm-d", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("qbz506/psalm-arm-d", trust_remote_code=True)

See the project site and repository for the method, the seed-replicated results, and the scope statement. This checkpoint is part of a controlled scientific ablation; for the leaderboard-track model see qbz506/psalm-submission.

Downloads last month: 15

Safetensors

Model size

0.1B params

Tensor type

F32