Atomistic Language Models

A single Qwen3-8B backbone that understands, generates, and edits crystals by reading atoms as soft tokens from a machine-learning interatomic potential and steering a MatterGen diffusion decoder with classifier-free guidance. One repo, one subdir per model:

subdir model what
stage1-projector/ structure-to-language projector OrbV3 → Qwen3 soft tokens (~70 MB)
alm-core/ ALM Core understanding: Qwen3-8B + LoRA (r128) + projector
alm-gen/ ALM Gen de-novo generation: consumer-only bridge (r8) over mattergen_base
alm-edit/ ALM Edit CSP + editing: producer-consumer bridge + full-FT Qwen3-8B (llm_full_ft/) + csp_backbone/ decoder

Headlines (paper, https://arxiv.org/abs/2606.21395). ALM Edit: CSP MR@20 83.2% / RMSE@1 0.021 Å (MP-20, SoTA), and SoTA across the ALM Bench editing tasks. ALM Gen: de-novo SUN 7.80% on the MP-20 hull (above the g=0 MatterGen base) and metastable MSUN 35.2% on LeMat-GenBench. See each subdir's card for full tables.

Download into ./checkpoints/ with hf download LearningMatter/AtomisticLanguageModels --local-dir ./checkpoints. The ALM Bench dataset lives in LearningMatter/ALM-Bench. mattergen_base (ALM Gen's backbone) is fetched from microsoft/mattergen; alm-edit/csp_backbone/ (the CSP decoder) ships here.

Links

Paper: arXiv · HuggingFace · Code: GitHub

License

Apache-2.0.

Citation

@article{edamadaka2026atomistic,
  title   = {Atomistic Language Models Understand and Generate Materials},
  author  = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
  journal = {arXiv preprint arXiv:2606.21395},
  year    = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for LearningMatter/AtomisticLanguageModels