ViSNet
Reference
Yusong Wang, Tong Wang, Shaoning Li, Xinheng He, Mingyu Li, Zun Wang, Nanning Zheng, Bin Shao, and Tie-Yan Liu.
Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing.
Nature Communications, 15(1), January 2024. ISSN: 2041-1723.
URL: https://dx.doi.org/10.1038/s41467-023-43720-2.
Hyperparameters, model configurations and training strategies
Model architecture
| Parameter |
Value |
Description |
num_layers |
4 |
Number of ViSNet layers. |
num_channels |
128 |
Number of channels. |
l_max |
2 |
Highest harmonic order included in the Spherical Harmonics series. |
num_heads |
8 |
Number of heads in the attention block. |
num_rbf |
32 |
Number of radial basis functions in the embedding block. |
trainable_rbf |
False |
Whether to add learnable weights to the radial embedding basis functions. |
activation |
silu |
Activation function for the output block. |
attn_activation |
silu |
Activation function for the attention block. |
vecnorm_type |
None |
Type of the vector norm. |
atomic_energies |
average |
Treatment of the atomic energies. |
avg_um_neighbors |
None |
Mean number of neighbors. |
Training
| Parameter |
Value |
Description |
num_epochs |
220 |
Number of epochs to run. |
ema_decay |
0.99 |
The EMA decay rate. |
eval_num_graphs |
None |
Number of validation set graphs to evaluate on. |
use_ema_params_for_eval |
True |
Whether to use the EMA parameters for evaluation. |
Optimizer
| Parameter |
Value |
Description |
init_learning_rate |
0.0001 |
Initial learning rate. |
peak_learning_rate |
0.0001 |
Peak learning rate. |
final_learning_rate |
0.0001 |
Final learning rate. |
weight_decay |
0 |
Weight decay. |
warmup_steps |
4000 |
Number of optimizer warm-up steps. |
transition_steps |
360000 |
Number of optimizer transition steps. |
grad_norm |
500 |
Gradient norm used for gradient clipping. |
num_gradient_accumulation_steps |
1 |
Steps to accumulate before taking an optimizer step. |
algorithm |
optax.amsgrad |
The AMSGrad optimizer. |
b1 |
0.9 |
Exponential decay rate to track first moment of past gradients. |
b2 |
0.999 |
Exponential decay rate to track second moment of past gradients. |
eps |
1e-8 |
Constant applied to denominator outside the square root. |
eps_root |
0.0 |
Constant applied to denominator inside the square root. |
Huber Loss Energy weight schedule
| Parameter |
Value |
Description |
schedule |
optax.piecewise_constant_schedule |
Piecewise constant schedule with scaled jumps at specific boundaries. |
init_value |
40 |
Initial value. |
boundaries_and_scale |
{115: 25} |
Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step. |
Huber Loss Force weight schedule
| Parameter |
Value |
Description |
schedule |
optax.piecewise_constant_schedule |
Piecewise constant schedule with scaled jumps at specific boundaries. |
init_value |
1000 |
Initial value. |
boundaries_and_scale |
{115: 0.04} |
Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step. |
Dataset
| Parameter |
Value |
Description |
graph_cutoff_angstrom |
5 |
Graph cutoff distance (in Å). |
max_n_node |
32 |
Maximum number of nodes allowed in a batch. |
max_n_edge |
288 |
Maximum number of edges allowed in a batch. |
batch_size |
16 |
Number of graphs in a batch. |
| This model was trained on the SPICE2_curated dataset. |
|
|
How to Use
For complete usage instructions and more information, please refer to our documentation
License summary
- The Licensed Models are only available under this License for Non-Commercial Purposes.
- You are permitted to reproduce, publish, share and adapt the Output generated by the Licensed Model only for Non-Commercial Purposes and in accordance with this License.
- You may not use the Licensed Models or any of its Outputs in connection with:
- any Commercial Purposes, unless agreed by Us under a separate licence;
- to train, improve or otherwise influence the functionality or performance of any other third-party derivative model that is commercial or intended for a Commercial Purpose and is similar to the Licensed Models;
- to create models distilled or derived from the Outputs of the Licensed Models, unless such models are for Non-Commercial Purposes and open-sourced under the same license as the Licensed Models; or
- in violation of any applicable laws and regulations.