Atomistic Language Models
A single Qwen3-8B backbone that understands, generates, and edits crystals by reading atoms as soft tokens from a machine-learning interatomic potential and steering a MatterGen diffusion decoder with classifier-free guidance. One repo, one subdir per model:
| subdir | model | what |
|---|---|---|
stage1-projector/ |
structure-to-language projector | OrbV3 → Qwen3 soft tokens (~70 MB) |
alm-core/ |
ALM Core | understanding: Qwen3-8B + LoRA (r128) + projector |
alm-gen/ |
ALM Gen | de-novo generation: consumer-only bridge (r8) over mattergen_base |
alm-edit/ |
ALM Edit | CSP + editing: producer-consumer bridge + full-FT Qwen3-8B (llm_full_ft/) + csp_backbone/ decoder |
Headlines (paper, https://arxiv.org/abs/2606.21395). ALM Edit: CSP MR@20 83.2% / RMSE@1 0.021 Å (MP-20, SoTA), and SoTA across the ALM Bench editing tasks. ALM Gen: de-novo SUN 7.80% on the MP-20 hull (above the g=0 MatterGen base) and metastable MSUN 35.2% on LeMat-GenBench. See each subdir's card for full tables.
Download into ./checkpoints/ with hf download LearningMatter/AtomisticLanguageModels --local-dir ./checkpoints.
The ALM Bench dataset lives in LearningMatter/ALM-Bench. mattergen_base (ALM Gen's backbone) is
fetched from microsoft/mattergen; alm-edit/csp_backbone/ (the CSP decoder) ships here.
Links
Paper: arXiv · HuggingFace · Code: GitHub
License
Apache-2.0.
Citation
@article{edamadaka2026atomistic,
title = {Atomistic Language Models Understand and Generate Materials},
author = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
journal = {arXiv preprint arXiv:2606.21395},
year = {2026}
}