--- license: apache-2.0 tags: - materials - qwen3 - diffusion - crystal-structure-prediction - crystal-generation --- # Atomistic Language Models A single Qwen3-8B backbone that understands, generates, and edits crystals by reading atoms as **soft tokens** from a machine-learning interatomic potential and steering a MatterGen diffusion decoder with classifier-free guidance. One repo, one subdir per model: | subdir | model | what | |---|---|---| | `stage1-projector/` | structure-to-language projector | OrbV3 → Qwen3 soft tokens (~70 MB) | | `alm-core/` | **ALM Core** | understanding: Qwen3-8B + LoRA (r128) + projector | | `alm-gen/` | **ALM Gen** | de-novo generation: consumer-only bridge (r8) over `mattergen_base` | | `alm-edit/` | **ALM Edit** | CSP + editing: producer-consumer bridge + full-FT Qwen3-8B (`llm_full_ft/`) + `csp_backbone/` decoder | Headlines (paper, https://arxiv.org/abs/2606.21395). **ALM Edit**: CSP MR@20 **83.2%** / RMSE@1 **0.021 Å** (MP-20, SoTA), and SoTA across the **ALM Bench** editing tasks. **ALM Gen**: de-novo SUN **7.80%** on the MP-20 hull (above the g=0 MatterGen base) and metastable **MSUN 35.2%** on LeMat-GenBench. See each subdir's card for full tables. Download into `./checkpoints/` with `hf download LearningMatter/AtomisticLanguageModels --local-dir ./checkpoints`. The **ALM Bench** dataset lives in `LearningMatter/ALM-Bench`. `mattergen_base` (ALM Gen's backbone) is fetched from `microsoft/mattergen`; `alm-edit/csp_backbone/` (the CSP decoder) ships here. ## Links Paper: [arXiv](https://arxiv.org/abs/2606.21395) · [HuggingFace](https://huggingface.co/papers/2606.21395) · Code: [GitHub](https://github.com/learningmatter-mit/alm) ## License Apache-2.0. ## Citation ```bibtex @article{edamadaka2026atomistic, title = {Atomistic Language Models Understand and Generate Materials}, author = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael}, journal = {arXiv preprint arXiv:2606.21395}, year = {2026} } ```