xelm-gemma-4b-dense

Dense continual pre-training (CPT) of Gemma-3-4B on a concatenated 25B-token mixture across Slavic, Germanic, Indic, Austronesian, and Romance language families. This is the no-regularization baseline.

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sanchitahuja205/xelm-gemma-4b-dense")
tokenizer = AutoTokenizer.from_pretrained("sanchitahuja205/xelm-gemma-4b-dense")

Training recipe

The exact training recipe lives in configs/yaml/train_gemma_dense.yaml in the code repo. The resolved config used for this specific run is also included in this model repo as training_config.yaml — load it with pyrallis to reproduce the run bit-for-bit:

python train.py --config_path configs/yaml/train_gemma_dense.yaml

Reproducing the dense-reverted variant

python train.py --config_path configs/yaml/revert_gemma_checkpoint.yaml \
    --revert.checkpoint_path $(huggingface-cli download sanchitahuja205/xelm-gemma-4b-dense) \
    --revert.revert_output_path ./reverted

Reproducing the expert soup (uniform average of 5 experts)

python model_soup.py \
    --experts sanchitahuja205/xelm-gemma-4b-slavic-expert \
              sanchitahuja205/xelm-gemma-4b-germanic-expert \
              sanchitahuja205/xelm-gemma-4b-indic-expert \
              sanchitahuja205/xelm-gemma-4b-austronesian-expert \
              sanchitahuja205/xelm-gemma-4b-romance-expert \
    --output_dir ./xelm-gemma-4b-expert-soup \
    --alpha 1.0

Citation

@misc{ahuja2026parameteralignmentmitigatescatastrophic,
      title={Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models},
      author={Sanchit Ahuja and Terra Blevins},
      year={2026},
      eprint={2606.00284},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.00284},
}
Downloads last month
18
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sanchitahuja205/xelm-gemma-4b-dense

Finetuned
(308)
this model

Collection including sanchitahuja205/xelm-gemma-4b-dense

Paper for sanchitahuja205/xelm-gemma-4b-dense