File size: 2,035 Bytes
5619883
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
license: apache-2.0
tags:
  - materials
  - qwen3
  - diffusion
  - crystal-structure-prediction
  - crystal-generation
---
# Atomistic Language Models

A single Qwen3-8B backbone that understands, generates, and edits crystals by reading
atoms as **soft tokens** from a machine-learning interatomic potential and steering a
MatterGen diffusion decoder with classifier-free guidance. One repo, one subdir per model:

| subdir | model | what |
|---|---|---|
| `stage1-projector/` | structure-to-language projector | OrbV3 → Qwen3 soft tokens (~70 MB) |
| `alm-core/` | **ALM Core** | understanding: Qwen3-8B + LoRA (r128) + projector |
| `alm-gen/` | **ALM Gen** | de-novo generation: consumer-only bridge (r8) over `mattergen_base` |
| `alm-edit/` | **ALM Edit** | CSP + editing: producer-consumer bridge + full-FT Qwen3-8B (`llm_full_ft/`) + `csp_backbone/` decoder |

Headlines (paper, https://arxiv.org/abs/2606.21395). **ALM Edit**: CSP MR@20 **83.2%** / RMSE@1 **0.021 Å**
(MP-20, SoTA), and SoTA across the **ALM Bench** editing tasks. **ALM Gen**: de-novo SUN
**7.80%** on the MP-20 hull (above the g=0 MatterGen base) and metastable **MSUN 35.2%** on
LeMat-GenBench. See each subdir's card for full tables.

Download into `./checkpoints/` with `hf download LearningMatter/AtomisticLanguageModels --local-dir ./checkpoints`.
The **ALM Bench** dataset lives in `LearningMatter/ALM-Bench`. `mattergen_base` (ALM Gen's backbone) is
fetched from `microsoft/mattergen`; `alm-edit/csp_backbone/` (the CSP decoder) ships here.

## Links
Paper: [arXiv](https://arxiv.org/abs/2606.21395) · [HuggingFace](https://huggingface.co/papers/2606.21395) · Code: [GitHub](https://github.com/learningmatter-mit/alm)

## License
Apache-2.0.

## Citation
```bibtex
@article{edamadaka2026atomistic,
  title   = {Atomistic Language Models Understand and Generate Materials},
  author  = {Edamadaka, Sathya and Ramesh, Krithik and Li, Ju and G\'omez-Bombarelli, Rafael},
  journal = {arXiv preprint arXiv:2606.21395},
  year    = {2026}
}
```