Atomistic Language Models

A single Qwen3-8B backbone that understands, generates, and edits crystals by reading atoms as soft tokens from a machine-learning interatomic potential and steering a MatterGen diffusion decoder with classifier-free guidance. One repo, one subdir per model:

subdir	model	what
`stage1-projector/`	structure-to-language projector	OrbV3 → Qwen3 soft tokens (~70 MB)
`alm-core/`	ALM Core	understanding: Qwen3-8B + LoRA (r128) + projector
`alm-gen/`	ALM Gen	de-novo generation: consumer-only bridge (r8) over `mattergen_base`
`alm-edit/`	ALM Edit	CSP + editing: producer-consumer bridge + full-FT Qwen3-8B (`llm_full_ft/`) + `csp_backbone/` decoder

Headlines (paper, https://arxiv.org/abs/2606.21395). ALM Edit: CSP MR@20 83.2% / RMSE@1 0.021 Å (MP-20, SoTA), and SoTA across the ALM Bench editing tasks. ALM Gen: de-novo SUN 7.80% on the MP-20 hull (above the g=0 MatterGen base) and metastable MSUN 35.2% on LeMat-GenBench. See each subdir's card for full tables.

Download into ./checkpoints/ with hf download LearningMatter/AtomisticLanguageModels --local-dir ./checkpoints. The ALM Bench dataset lives in LearningMatter/ALM-Bench. mattergen_base (ALM Gen's backbone) is fetched from microsoft/mattergen; alm-edit/csp_backbone/ (the CSP decoder) ships here.

Links

Paper: arXiv · HuggingFace · Code: GitHub

License

Apache-2.0.

Citation

@article{edamadaka2026atomistic,
  title   = {Atomistic Language Models Understand and Generate Materials},
  author  = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
  journal = {arXiv preprint arXiv:2606.21395},
  year    = {2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for LearningMatter/AtomisticLanguageModels

Atomistic Language Models Understand and Generate Materials

Paper • 2606.21395 • Published 9 days ago • 1