penganyang commited on
Commit
c1c3d42
·
verified ·
1 Parent(s): 2717b00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +205 -3
README.md CHANGED
@@ -1,3 +1,205 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ ---
4
+
5
+ ## Model Summary
6
+
7
+ This model card provides a DPA3 model[1] trained on the OMol25[2] dataset. We provide one model:
8
+
9
+ - `DPA3-Omol-Large` – 12 layers, large-scale model for broad molecular chemistry
10
+
11
+ The model is trained with **charge** and **spin** as input frame parameters, following the OMol25 dataset convention. Here **spin** refers to the **spin multiplicity** (2S+1), not the spin quantum number S. Users can specify `charge` and `spin` when running simulations; if not specified, defaults of `charge=0` and `spin=1` (singlet) are used.
12
+
13
+ The model is compatible with **DeePMD-kit v3.1.3**. For other installation options, please visit the [Releases page](https://github.com/deepmodeling/deepmd-kit/releases/tag/v3.1.3) to download the off-line package for v3.1.3, and refer to the [official documentation](https://docs.deepmodeling.com/projects/deepmd/en/v3.1.3/install/easy-install.html) for off-line installation instructions.
14
+
15
+ ## Usage
16
+
17
+ ### Model Evaluation
18
+
19
+ Evaluate the model through the dp test command line:
20
+
21
+ ```bash
22
+ dp --pt test -m DPA3-Omol-Large.pt -s path_to_your_system
23
+ ```
24
+
25
+ ### ASE Calculator
26
+
27
+ You can directly use the following Python code for prediction or optimization with standard ASE calculator.
28
+
29
+ **Charge and spin** can be explicitly specified via `fparam` keyword in `atoms.info`. Note that `spin` here means **spin multiplicity** (2S+1). If not set, the default values `charge=0` and `spin=1` (singlet) will be used.
30
+
31
+ ```python
32
+ ## Compute potential energy
33
+ from ase import Atoms
34
+ from deepmd.calculator import DP as DPCalculator
35
+
36
+ dp = DPCalculator("DPA3-Omol-Large.pt")
37
+
38
+ # Example: ethanol molecule
39
+ ethanol = Atoms(
40
+ "C2H6O",
41
+ positions=[
42
+ (-0.7472, -0.0575, 0.0000),
43
+ ( 0.7209, 0.0178, 0.0000),
44
+ ( 1.1431, 1.4297, 0.0000),
45
+ (-1.1576, -1.0720, 0.0000),
46
+ (-1.1267, 0.4548, -0.8932),
47
+ (-1.1267, 0.4548, 0.8932),
48
+ ( 1.0797, -0.5050, -0.8946),
49
+ ( 1.0797, -0.5050, 0.8946),
50
+ ( 2.1108, 1.4520, 0.0000),
51
+ ],
52
+ cell=[100, 100, 100],
53
+ )
54
+
55
+ # Specify charge and spin multiplicity (optional)
56
+ # If not set, defaults are charge=0, spin=1 (singlet)
57
+ ethanol.info.update(
58
+ {"fparam": [0.0, 1.0]} # charge=0, spin multiplicity=1 (singlet)
59
+ )
60
+
61
+ ethanol.calc = dp
62
+ print(ethanol.get_potential_energy())
63
+ print(ethanol.get_forces())
64
+
65
+ ## Run BFGS structure optimization
66
+ from ase.optimize import BFGS
67
+
68
+ dyn = BFGS(ethanol)
69
+ dyn.run(fmax=1e-6)
70
+ print(ethanol.get_positions())
71
+ ```
72
+
73
+ ### LAMMPS
74
+
75
+ Use LAMMPS for molecular dynamics calculation with the DPA3 model, you first need to freeze the *.pt model into a *.pth model using the following command:
76
+
77
+ ```bash
78
+ dp --pt freeze -c DPA3-Omol-Large.pt -o DPA3-Omol-Large.pth
79
+ ```
80
+
81
+ Then you can make the following modifications in the LAMMPS script to call the DeePMD-kit interface (also see `potential.md`).
82
+
83
+ **Charge and spin** are provided via the `fparam` keyword in the order of **charge, spin** (spin = spin multiplicity, 2S+1). If `fparam` is not specified, the default values `0.0 1.0` (charge=0, spin multiplicity=1, i.e. singlet) will be used.
84
+
85
+ ```bash
86
+ # With explicit charge and spin multiplicity (e.g., charge=2, multiplicity=1)
87
+ pair_style deepmd DPA3-Omol-Large.pth fparam 2.0 1.0
88
+ pair_coeff * * C H O
89
+ ```
90
+
91
+ ```bash
92
+ # Without fparam: defaults to charge=0, spin multiplicity=1
93
+ pair_style deepmd DPA3-Omol-Large.pth
94
+ pair_coeff * * C H O
95
+ ```
96
+
97
+ For more details on the `fparam` keyword, see the [DeePMD-kit LAMMPS documentation](https://docs.deepmodeling.com/projects/deepmd/en/stable/third-party/lammps-command.html#pair-style-deepmd).
98
+
99
+ ## Training Dataset
100
+
101
+ The model is trained on the **Open Molecules 2025 (OMol25)** dataset[2], a large-scale resource for molecular chemistry ML models introduced by Meta FAIR. OMol25 comprises over **100 million** DFT single-point calculations at the **ωB97M-V/def2-TZVPD** level of theory.
102
+
103
+ Key characteristics of OMol25:
104
+
105
+ - **83 elements** across the periodic table
106
+ - **~83M unique molecular systems**, including small molecules, biomolecules, metal complexes, and electrolytes
107
+ - System sizes up to **350 atoms** (50 on average)
108
+ - Diverse charge states (−10 to +10) and spin multiplicities (1 to 11)
109
+ - Explicit solvation, conformers, and reactive structures
110
+
111
+ The dataset is organized into four major domains:
112
+
113
+ - **Biomolecules:** protein–ligand, protein–protein, and nucleic acid interactions extracted from BioLiP2 and other structural databases, sampled via classical MD
114
+ - **Metal Complexes:** diverse monometallic transition metal, main group metal, and lanthanide systems with varied ligands and spin states, generated using the Architector package
115
+ - **Electrolytes:** aqueous and non-aqueous solutions, ionic liquids, and molten salts, sampled via MD (including Ring Polymer MD for nuclear quantum effects) and electrolyte reactivity networks
116
+ - **Community:** recomputed existing datasets (ANI-2X, Transition-1X, SPICE2, GEOM, etc.) at consistent ωB97M-V/def2-TZVPD level of theory, plus interpolated reactivity datasets
117
+
118
+ Compositional splitting ensures that validation and test sets contain out-of-distribution molecular formulas relative to training data.
119
+
120
+ ## Training Details
121
+
122
+ We train the **DPA3** model in its large (12-layer) configuration, truncated within LiGS order 2.
123
+
124
+ ### Model configuration
125
+
126
+
127
+ | Parameter | Value |
128
+ | --------- | ----- |
129
+ | `n_dim` | 256 |
130
+ | `e_dim` | 256 |
131
+ | `a_dim` | 256 |
132
+ | `nlayers` | 12 |
133
+
134
+ ### Training setup
135
+
136
+ - **Engine:** DeePMD-kit (`v3.1.0` required)
137
+ - **Batch size:** `auto:2048` (DeePMD-kit automatic batchsize)
138
+ - **Hardware:** 32 × NVIDIA A800 GPUs
139
+ - **Training steps:** 2 million steps
140
+ - **Learning rate schedule:** Cosine annealing
141
+ - **Cutoff radii and neighbor selections:**
142
+ - `e_rcut = 6.0`, `e_rcut_smth = 5.3`, `e_sel = 30`
143
+ - `a_rcut = 4.5`, `a_rcut_smth = 4.0`, `a_sel = 15`
144
+
145
+ Other hyperparameters and training details can be found in the DPA3 paper[1].
146
+
147
+ ## Performance
148
+
149
+ ### Accuracy on OMol25 Validation Set
150
+
151
+ We report energy and force errors on the OMol25 validation set. All values are in **meV** (energy per atom) and **meV/Å** (force).
152
+
153
+
154
+ | Model | Energy MAE/atom (meV) | Energy RMSE/atom (meV) | Force MAE (meV/Å) | Force RMSE (meV/Å) |
155
+ | ------------------- | :-------------------: | :--------------------: | :----------------: | :-----------------: |
156
+ | MACE-OMol-L | 1.917 | 11.727 | 10.690 | 63.754 |
157
+ | **DPA3-Omol-Large** | 1.328 | 11.347 | 12.362 | 62.934 |
158
+
159
+ ### Accuracy on LAMBench
160
+
161
+ We evaluate DPA3-Omol-Large on molecule property calculation tasks from **LAMBench**[3]. The following results compare **DPA3-Omol-Large (ours)** with DPA-3.2-5M and MACE-OMol-L.
162
+
163
+ #### Ligand Binding
164
+
165
+
166
+ | Model | RMSE (kcal/mol) | MAE (kcal/mol) |
167
+ | ------------------- | :-------------: | :------------: |
168
+ | DPA-3.2-5M | 3.40 | 6.90 |
169
+ | MACE-OMol-L | 1.75 | 0.84 |
170
+ | **DPA3-Omol-Large** | 1.29 | 0.65 |
171
+
172
+ #### TorsionNet500
173
+
174
+
175
+ | Model | MAEB (kcal/mol) | MAE (kcal/mol) | RMSE (kcal/mol) | NAHB_h |
176
+ | ------------------- | :-------------: | :------------: | :-------------: | :----: |
177
+ | DPA-3.2-5M | 0.47 | 0.29 | 0.43 | 50 |
178
+ | MACE-OMol-L | 0.23 | 0.14 | 0.23 | 9 |
179
+ | **DPA3-Omol-Large** | 0.24 | 0.16 | 0.25 | 13 |
180
+
181
+ #### Wiggle150
182
+
183
+
184
+ | Model | RMSE (kcal/mol) | MAE (kcal/mol) |
185
+ | ------------------- | :-------------: | :------------: |
186
+ | DPA-3.2-5M | 1.54 | 1.19 |
187
+ | MACE-OMol-L | 1.18 | 0.89 |
188
+ | **DPA3-Omol-Large** | 1.25 | 0.94 |
189
+
190
+ #### Reaction Barrier
191
+
192
+
193
+ | Model | RMSE (kcal/mol) | MAE (kcal/mol) |
194
+ | ------------------- | :-------------: | :------------: |
195
+ | DPA-3.2-5M | 12.37 | 6.30 |
196
+ | MACE-OMol-L | 3.53 | 2.12 |
197
+ | **DPA3-Omol-Large** | 12.42 | 3.36 |
198
+
199
+ ## Reference
200
+
201
+ [1] Duo Zhang, Anyang Peng, Chun Cai, Wentao Li, Yuanchang Zhou, Jinzhe Zeng, Mingyu Guo et al. "A Graph Neural Network for the Era of Large Atomistic Models." *arXiv preprint arXiv:2506.01686* (2025).
202
+
203
+ [2] Daniel S. Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G. Taylor, Muhammad R. Hasyim, Kyle Michel, Ilyes Batatia et al. "The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models." *arXiv preprint arXiv:2505.08762* (2025).
204
+
205
+ [3] Anyang Peng, Chun Cai, Mingyu Guo, Duo Zhang, Chengqian Zhang, Wanrun Jiang, Yinan Wang et al. "LAMBench: a benchmark for large atomistic models." *npj Computational Materials* (2026).