eacortes commited on
Commit
beda960
·
verified ·
1 Parent(s): 21e9591

Upload 8 files

Browse files
README.md CHANGED
@@ -1,3 +1,111 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Derify/ModChemBERT-IR-BASE
4
+ library_name: transformers
5
+ tags:
6
+ - modernbert
7
+ - ModChemBERT
8
+ - cheminformatics
9
+ - chemical-language-model
10
+ pipeline_tag: fill-mask
11
+ ---
12
+
13
+ # ModChemBERT: ModernBERT as a Chemical Language Model
14
+ ModChemBERT-IR-BASE is a ModernBERT-based chemical language model (CLM) pretrained on SMILES strings using masked language modeling (MLM). This model serves as a base model for training embedding, retrieval, and reranking models for molecular information retrieval tasks.
15
+
16
+ ## Usage
17
+ Install the `transformers` library starting from v4.56.1:
18
+
19
+ ```bash
20
+ pip install -U transformers>=4.56.1
21
+ ```
22
+
23
+ ### Load Model
24
+ ```python
25
+ from transformers import AutoModelForMaskedLM, AutoTokenizer
26
+
27
+ model_id = "Derify/ModChemBERT-IR-BASE"
28
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
29
+ model = AutoModelForMaskedLM.from_pretrained(
30
+ model_id,
31
+ trust_remote_code=True,
32
+ dtype="bfloat16",
33
+ device_map="auto",
34
+ )
35
+ ```
36
+
37
+ ### Fill-Mask Pipeline
38
+ ```python
39
+ from transformers import pipeline
40
+
41
+ fill = pipeline("fill-mask", model=model, tokenizer=tokenizer)
42
+ print(fill("c1ccccc1[MASK]"))
43
+ ```
44
+
45
+ ## Architecture
46
+ - Backbone: ModernBERT [1]
47
+ - Hidden size: 1024
48
+ - Intermediate size: 1536
49
+ - Encoder Layers: 22
50
+ - Attention heads: 16
51
+ - Max sequence length: 512 tokens
52
+ - Tokenizer: BPE tokenizer using [MolFormer's vocab](https://github.com/emapco/ModChemBERT/blob/main/modchembert/tokenizers/molformer/vocab.json) (2362 tokens)
53
+
54
+ ## Dataset
55
+ - Pretraining: [PubChem 110M dataset (canonical SMILES strings)](https://ibm.ent.box.com/v/MoLFormer-data)
56
+
57
+ ## Pooling (Classifier / Regressor Head)
58
+ Kallergis et al. [2] demonstrated that the CLM embedding method prior to the prediction head was the strongest contributor to downstream performance among evaluated hyperparameters.
59
+
60
+ Behrendt et al. [3] noted that the last few layers contain task-specific information and that pooling methods leveraging information from multiple layers can enhance model performance. Their results further demonstrated that the `max_seq_mha` pooling method was particularly effective in low-data regimes.
61
+
62
+ This base model includes configurable pooling strategies for downstream fine-tuning. When fine-tuned for embedding, retrieval, or reranking tasks (e.g., with Sentence Transformers), various pooling methods can be explored:
63
+ - `cls`: Last layer [CLS]
64
+ - `mean`: Mean over last hidden layer
65
+ - `max_cls`: Max over last k layers of [CLS]
66
+ - `cls_mha`: MHA with [CLS] as query
67
+ - `max_seq_mha`: MHA with max pooled sequence as KV and max pooled [CLS] as query
68
+ - `sum_mean`: Sum over all layers then mean tokens
69
+ - `sum_sum`: Sum over all layers then sum tokens
70
+ - `mean_mean`: Mean over all layers then mean tokens
71
+ - `mean_sum`: Mean over all layers then sum tokens
72
+ - `max_seq_mean`: Max over last k layers then mean tokens
73
+
74
+ Note: ModChemBERT's `max_seq_mha` differs from MaxPoolBERT [3]. MaxPoolBERT uses PyTorch `nn.MultiheadAttention`, whereas ModChemBERT's `ModChemBertPoolingAttention` adapts ModernBERT's `ModernBertAttention`.
75
+ On ChemBERTa-3 benchmarks this variant produced stronger validation metrics and avoided the training instabilities (sporadic zero / NaN losses and gradient norms) seen with `nn.MultiheadAttention`. Training instability with ModernBERT has been reported in the past ([discussion 1](https://huggingface.co/answerdotai/ModernBERT-base/discussions/59) and [discussion 2](https://huggingface.co/answerdotai/ModernBERT-base/discussions/63)).
76
+
77
+ ## Intended Use
78
+ * Primary: Base model for training embedding, retrieval, and reranking models for chemical information retrieval tasks using frameworks such as Sentence Transformers.
79
+ * Appropriate for: Fine-tuning for semantic search of chemical compounds, molecular similarity tasks, chemical information retrieval systems, and as a foundation for building chemical embedding models.
80
+ * Not intended for: Direct molecular property prediction without fine-tuning, generating novel molecules, or production use without domain-specific validation.
81
+
82
+ ## Limitations
83
+ - This is a base model pretrained only on masked language modeling; it requires fine-tuning for specific information retrieval tasks.
84
+ - Performance on out-of-domain chemical spaces may vary: very long SMILES (>512 tokens), inorganic/organometallic compounds, polymers, or charged/enumerated tautomers may not be well represented in the training corpus.
85
+ - The model reflects the chemical space distribution of PubChem and may not generalize equally well to all chemical domains.
86
+
87
+ ## Ethical Considerations & Responsible Use
88
+ - This base model is intended for research and development purposes in chemical information retrieval.
89
+ - When fine-tuned for downstream applications, users should validate performance on their specific domain and use case.
90
+ - Do not deploy in clinical, regulatory, or safety-critical settings without rigorous domain-specific validation and appropriate oversight.
91
+
92
+ ## Hardware
93
+ Training was performed on two NVIDIA RTX 3090 GPUs using `accelerate` for distributed (DDP) training.
94
+
95
+ ## Citation
96
+ If you use ModChemBERT-IR-BASE in your research, please cite the checkpoint and the following:
97
+ ```
98
+ @software{cortes-2025-modchembert,
99
+ author = {Emmanuel Cortes},
100
+ title = {ModChemBERT: ModernBERT as a Chemical Language Model},
101
+ year = {2025},
102
+ publisher = {GitHub},
103
+ howpublished = {GitHub repository},
104
+ url = {https://github.com/emapco/ModChemBERT}
105
+ }
106
+ ```
107
+
108
+ ## References
109
+ 1. Warner, Benjamin, et al. "Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference." arXiv preprint arXiv:2412.13663 (2024).
110
+ 2. Kallergis, G., Asgari, E., Empting, M. et al. Domain adaptable language modeling of chemical compounds identifies potent pathoblockers for Pseudomonas aeruginosa. Commun Chem 8, 114 (2025). https://doi.org/10.1038/s42004-025-01484-4
111
+ 3. Behrendt, Maike, Stefan Sylvius Wagner, and Stefan Harmeling. "MaxPoolBERT: Enhancing BERT Classification via Layer-and Token-Wise Aggregation." arXiv preprint arXiv:2505.15696 (2025).
config.json ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModChemBertForMaskedLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.1,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_modchembert.ModChemBertConfig",
9
+ "AutoModel": "modeling_modchembert.ModChemBertModel",
10
+ "AutoModelForMaskedLM": "modeling_modchembert.ModChemBertForMaskedLM",
11
+ "AutoModelForSequenceClassification": "modeling_modchembert.ModChemBertForSequenceClassification"
12
+ },
13
+ "bos_token_id": 0,
14
+ "classifier_activation": "gelu",
15
+ "classifier_bias": false,
16
+ "classifier_dropout": 0.0,
17
+ "classifier_pooling": "max_seq_mha",
18
+ "classifier_pooling_attention_dropout": 0.1,
19
+ "classifier_pooling_last_k": 5,
20
+ "classifier_pooling_num_attention_heads": 4,
21
+ "cls_token_id": 0,
22
+ "decoder_bias": true,
23
+ "deterministic_flash_attn": false,
24
+ "dtype": "bfloat16",
25
+ "embedding_dropout": 0.1,
26
+ "eos_token_id": 1,
27
+ "global_attn_every_n_layers": 3,
28
+ "global_rope_theta": 160000.0,
29
+ "hidden_activation": "gelu",
30
+ "hidden_size": 1024,
31
+ "initializer_cutoff_factor": 2.0,
32
+ "initializer_range": 0.02,
33
+ "intermediate_size": 1536,
34
+ "layer_norm_eps": 1e-05,
35
+ "local_attention": 8,
36
+ "local_rope_theta": 10000.0,
37
+ "max_position_embeddings": 512,
38
+ "mlp_bias": false,
39
+ "mlp_dropout": 0.1,
40
+ "model_type": "modchembert",
41
+ "norm_bias": false,
42
+ "norm_eps": 1e-05,
43
+ "num_attention_heads": 16,
44
+ "num_hidden_layers": 22,
45
+ "pad_token_id": 2,
46
+ "position_embedding_type": "absolute",
47
+ "repad_logits_with_grad": false,
48
+ "sep_token_id": 1,
49
+ "sparse_pred_ignore_index": -100,
50
+ "sparse_prediction": false,
51
+ "transformers_version": "4.56.1",
52
+ "vocab_size": 2362
53
+ }
configuration_modchembert.py ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2025 Emmanuel Cortes, All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ from typing import Literal
16
+
17
+ from transformers.models.modernbert.configuration_modernbert import ModernBertConfig
18
+
19
+
20
+ class ModChemBertConfig(ModernBertConfig):
21
+ """
22
+ Configuration class for ModChemBert models.
23
+
24
+ This configuration class extends ModernBertConfig with additional parameters specific to
25
+ chemical molecule modeling and custom pooling strategies for classification/regression tasks.
26
+ It accepts all arguments and keyword arguments from ModernBertConfig.
27
+
28
+ Args:
29
+ classifier_pooling (str, optional): Pooling strategy for sequence classification.
30
+ Available options:
31
+ - "cls": Use CLS token representation
32
+ - "mean": Attention-weighted average pooling
33
+ - "sum_mean": Sum all hidden states across layers, then mean pool over sequence (ChemLM approach)
34
+ - "sum_sum": Sum all hidden states across layers, then sum pool over sequence
35
+ - "mean_mean": Mean all hidden states across layers, then mean pool over sequence
36
+ - "mean_sum": Mean all hidden states across layers, then sum pool over sequence
37
+ - "max_cls": Element-wise max pooling over last k hidden states, then take CLS token
38
+ - "cls_mha": Multi-head attention with CLS token as query and full sequence as keys/values
39
+ - "max_seq_mha": Max pooling over last k states + multi-head attention with CLS as query
40
+ - "max_seq_mean": Max pooling over last k hidden states, then mean pooling over sequence
41
+ Defaults to "sum_mean".
42
+ classifier_pooling_num_attention_heads (int, optional): Number of attention heads for multi-head attention
43
+ pooling strategies (cls_mha, max_seq_mha). Defaults to 4.
44
+ classifier_pooling_attention_dropout (float, optional): Dropout probability for multi-head attention
45
+ pooling strategies (cls_mha, max_seq_mha). Defaults to 0.0.
46
+ classifier_pooling_last_k (int, optional): Number of last hidden layers to use for max pooling
47
+ strategies (max_cls, max_seq_mha, max_seq_mean). Defaults to 8.
48
+ *args: Variable length argument list passed to ModernBertConfig.
49
+ **kwargs: Arbitrary keyword arguments passed to ModernBertConfig.
50
+
51
+ Note:
52
+ This class inherits all configuration parameters from ModernBertConfig including
53
+ hidden_size, num_hidden_layers, num_attention_heads, intermediate_size, etc.
54
+ """
55
+
56
+ model_type = "modchembert"
57
+
58
+ def __init__(
59
+ self,
60
+ *args,
61
+ classifier_pooling: Literal[
62
+ "cls",
63
+ "mean",
64
+ "sum_mean",
65
+ "sum_sum",
66
+ "mean_mean",
67
+ "mean_sum",
68
+ "max_cls",
69
+ "cls_mha",
70
+ "max_seq_mha",
71
+ "max_seq_mean",
72
+ ] = "max_seq_mha",
73
+ classifier_pooling_num_attention_heads: int = 4,
74
+ classifier_pooling_attention_dropout: float = 0.0,
75
+ classifier_pooling_last_k: int = 8,
76
+ **kwargs,
77
+ ):
78
+ # Pass classifier_pooling="cls" to circumvent ValueError in ModernBertConfig init
79
+ super().__init__(*args, classifier_pooling="cls", **kwargs)
80
+ # Override with custom value
81
+ self.classifier_pooling = classifier_pooling
82
+ self.classifier_pooling_num_attention_heads = classifier_pooling_num_attention_heads
83
+ self.classifier_pooling_attention_dropout = classifier_pooling_attention_dropout
84
+ self.classifier_pooling_last_k = classifier_pooling_last_k
85
+ self.auto_map = {
86
+ "AutoConfig": "configuration_modchembert.ModChemBertConfig",
87
+ "AutoModel": "modeling_modchembert.ModChemBertModel",
88
+ "AutoModelForMaskedLM": "modeling_modchembert.ModChemBertForMaskedLM",
89
+ "AutoModelForSequenceClassification": "modeling_modchembert.ModChemBertForSequenceClassification",
90
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d213c4cf107cb25234351be1c000c35e5353cd9faced047d57f2bb3d9680fc4
3
+ size 399215212
modeling_modchembert.py ADDED
@@ -0,0 +1,734 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2025 Emmanuel Cortes, All Rights Reserved.
2
+ #
3
+ # Copyright 2024 Answer.AI, LightOn, and contributors, and the HuggingFace Inc. team. All rights reserved.
4
+ #
5
+ #
6
+ # Licensed under the Apache License, Version 2.0 (the "License");
7
+ # you may not use this file except in compliance with the License.
8
+ # You may obtain a copy of the License at
9
+ #
10
+ # http://www.apache.org/licenses/LICENSE-2.0
11
+ #
12
+ # Unless required by applicable law or agreed to in writing, software
13
+ # distributed under the License is distributed on an "AS IS" BASIS,
14
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15
+ # See the License for the specific language governing permissions and
16
+ # limitations under the License.
17
+
18
+ # This file is adapted from the transformers library.
19
+ # Modifications include:
20
+ # - Additional classifier_pooling options for ModChemBertForSequenceClassification
21
+ # - sum_mean, sum_sum, mean_sum, mean_mean: from ChemLM (utilizes all hidden states)
22
+ # - max_cls, cls_mha, max_seq_mha: from MaxPoolBERT (utilizes last k hidden states)
23
+ # - max_seq_mean: a merge between sum_mean and max_cls (utilizes last k hidden states)
24
+ # - Addition of ModChemBertPoolingAttention for cls_mha and max_seq_mha pooling options
25
+
26
+ import copy
27
+ import math
28
+ import typing
29
+ from contextlib import nullcontext
30
+
31
+ import torch
32
+ import torch.nn as nn
33
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
34
+ from transformers.modeling_attn_mask_utils import _prepare_4d_attention_mask
35
+ from transformers.modeling_outputs import BaseModelOutput, MaskedLMOutput, SequenceClassifierOutput
36
+ from transformers.models.modernbert.modeling_modernbert import (
37
+ MODERNBERT_ATTENTION_FUNCTION,
38
+ ModernBertEmbeddings,
39
+ ModernBertEncoderLayer,
40
+ ModernBertModel,
41
+ ModernBertPredictionHead,
42
+ ModernBertPreTrainedModel,
43
+ ModernBertRotaryEmbedding,
44
+ _pad_modernbert_output,
45
+ _unpad_modernbert_input,
46
+ )
47
+ from transformers.utils import logging
48
+
49
+ from .configuration_modchembert import ModChemBertConfig
50
+
51
+ logger = logging.get_logger(__name__)
52
+
53
+
54
+ class InitWeightsMixin:
55
+ def _init_weights(self, module: nn.Module):
56
+ super()._init_weights(module) # type: ignore
57
+
58
+ cutoff_factor = self.config.initializer_cutoff_factor # type: ignore
59
+ if cutoff_factor is None:
60
+ cutoff_factor = 3
61
+
62
+ def init_weight(module: nn.Module, std: float):
63
+ if isinstance(module, nn.Linear):
64
+ nn.init.trunc_normal_(
65
+ module.weight,
66
+ mean=0.0,
67
+ std=std,
68
+ a=-cutoff_factor * std,
69
+ b=cutoff_factor * std,
70
+ )
71
+ if module.bias is not None:
72
+ nn.init.zeros_(module.bias)
73
+
74
+ stds = {
75
+ "in": self.config.initializer_range, # type: ignore
76
+ "out": self.config.initializer_range / math.sqrt(2.0 * self.config.num_hidden_layers), # type: ignore
77
+ "final_out": self.config.hidden_size**-0.5, # type: ignore
78
+ }
79
+
80
+ if isinstance(module, ModChemBertForMaskedLM):
81
+ init_weight(module.decoder, stds["out"])
82
+ elif isinstance(module, ModChemBertForSequenceClassification):
83
+ init_weight(module.classifier, stds["final_out"])
84
+ elif isinstance(module, ModChemBertPoolingAttention):
85
+ init_weight(module.Wq, stds["in"])
86
+ init_weight(module.Wk, stds["in"])
87
+ init_weight(module.Wv, stds["in"])
88
+ init_weight(module.Wo, stds["out"])
89
+
90
+
91
+ class ModChemBertPoolingAttention(nn.Module):
92
+ """Performs multi-headed self attention on a batch of sequences."""
93
+
94
+ def __init__(self, config: ModChemBertConfig):
95
+ super().__init__()
96
+ self.config = copy.deepcopy(config)
97
+ # Override num_attention_heads to use classifier_pooling_num_attention_heads
98
+ self.config.num_attention_heads = config.classifier_pooling_num_attention_heads
99
+ # Override attention_dropout to use classifier_pooling_attention_dropout
100
+ self.config.attention_dropout = config.classifier_pooling_attention_dropout
101
+
102
+ if config.hidden_size % config.num_attention_heads != 0:
103
+ raise ValueError(
104
+ f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention heads "
105
+ f"({config.num_attention_heads})"
106
+ )
107
+
108
+ self.attention_dropout = config.attention_dropout
109
+ self.num_heads = config.num_attention_heads
110
+ self.head_dim = config.hidden_size // config.num_attention_heads
111
+ self.all_head_size = self.head_dim * self.num_heads
112
+ self.Wq = nn.Linear(config.hidden_size, self.all_head_size, bias=config.attention_bias)
113
+ self.Wk = nn.Linear(config.hidden_size, self.all_head_size, bias=config.attention_bias)
114
+ self.Wv = nn.Linear(config.hidden_size, self.all_head_size, bias=config.attention_bias)
115
+
116
+ # Use global attention
117
+ self.local_attention = (-1, -1)
118
+ rope_theta = config.global_rope_theta
119
+ # sdpa path from original ModernBert implementation
120
+ config_copy = copy.deepcopy(config)
121
+ config_copy.rope_theta = rope_theta
122
+ self.rotary_emb = ModernBertRotaryEmbedding(config=config_copy)
123
+
124
+ self.Wo = nn.Linear(config.hidden_size, config.hidden_size, bias=config.attention_bias)
125
+ self.out_drop = nn.Dropout(config.attention_dropout) if config.attention_dropout > 0.0 else nn.Identity()
126
+ self.pruned_heads = set()
127
+
128
+ def forward(
129
+ self,
130
+ q: torch.Tensor,
131
+ kv: torch.Tensor,
132
+ attention_mask: torch.Tensor | None = None,
133
+ **kwargs,
134
+ ) -> torch.Tensor:
135
+ bs, seq_len = kv.shape[:2]
136
+ q_proj: torch.Tensor = self.Wq(q)
137
+ k_proj: torch.Tensor = self.Wk(kv)
138
+ v_proj: torch.Tensor = self.Wv(kv)
139
+ qkv = torch.stack(
140
+ (
141
+ q_proj.reshape(bs, seq_len, self.num_heads, self.head_dim),
142
+ k_proj.reshape(bs, seq_len, self.num_heads, self.head_dim),
143
+ v_proj.reshape(bs, seq_len, self.num_heads, self.head_dim),
144
+ ),
145
+ dim=2,
146
+ ) # (bs, seq_len, 3, num_heads, head_dim)
147
+
148
+ device = kv.device
149
+ if attention_mask is None:
150
+ attention_mask = torch.ones((bs, seq_len), device=device, dtype=torch.bool)
151
+ position_ids = torch.arange(seq_len, device=device).unsqueeze(0).long()
152
+
153
+ attn_outputs = MODERNBERT_ATTENTION_FUNCTION["sdpa"](
154
+ self,
155
+ qkv=qkv,
156
+ attention_mask=_prepare_4d_attention_mask(attention_mask, kv.dtype),
157
+ sliding_window_mask=None, # not needed when using global attention
158
+ position_ids=position_ids,
159
+ local_attention=self.local_attention,
160
+ bs=bs,
161
+ dim=self.all_head_size,
162
+ **kwargs,
163
+ )
164
+ hidden_states = attn_outputs[0]
165
+ hidden_states = self.out_drop(self.Wo(hidden_states))
166
+
167
+ return hidden_states
168
+
169
+
170
+ class ModChemBertModel(ModernBertPreTrainedModel):
171
+ config_class = ModChemBertConfig
172
+
173
+ def __init__(self, config: ModChemBertConfig):
174
+ super().__init__(config)
175
+ self.config = config
176
+ self.embeddings = ModernBertEmbeddings(config)
177
+ self.layers = nn.ModuleList(
178
+ [ModernBertEncoderLayer(config, layer_id) for layer_id in range(config.num_hidden_layers)]
179
+ )
180
+ self.final_norm = nn.LayerNorm(config.hidden_size, eps=config.norm_eps, bias=config.norm_bias)
181
+ self.gradient_checkpointing = False
182
+ self.post_init()
183
+
184
+ def get_input_embeddings(self):
185
+ return self.embeddings.tok_embeddings
186
+
187
+ def set_input_embeddings(self, value):
188
+ self.embeddings.tok_embeddings = value # type: ignore
189
+
190
+ def forward(
191
+ self,
192
+ input_ids: torch.LongTensor | None = None,
193
+ attention_mask: torch.Tensor | None = None,
194
+ sliding_window_mask: torch.Tensor | None = None,
195
+ position_ids: torch.LongTensor | None = None,
196
+ inputs_embeds: torch.Tensor | None = None,
197
+ indices: torch.Tensor | None = None,
198
+ cu_seqlens: torch.Tensor | None = None,
199
+ max_seqlen: int | None = None,
200
+ batch_size: int | None = None,
201
+ seq_len: int | None = None,
202
+ output_attentions: bool | None = None,
203
+ output_hidden_states: bool | None = None,
204
+ return_dict: bool | None = None,
205
+ ) -> tuple[torch.Tensor, ...] | BaseModelOutput:
206
+ r"""
207
+ sliding_window_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
208
+ Mask to avoid performing attention on padding or far-away tokens. In ModernBert, only every few layers
209
+ perform global attention, while the rest perform local attention. This mask is used to avoid attending to
210
+ far-away tokens in the local attention layers when not using Flash Attention.
211
+ indices (`torch.Tensor` of shape `(total_unpadded_tokens,)`, *optional*):
212
+ Indices of the non-padding tokens in the input sequence. Used for unpadding the output.
213
+ cu_seqlens (`torch.Tensor` of shape `(batch + 1,)`, *optional*):
214
+ Cumulative sequence lengths of the input sequences. Used to index the unpadded tensors.
215
+ max_seqlen (`int`, *optional*):
216
+ Maximum sequence length in the batch excluding padding tokens. Used to unpad input_ids and pad output tensors.
217
+ batch_size (`int`, *optional*):
218
+ Batch size of the input sequences. Used to pad the output tensors.
219
+ seq_len (`int`, *optional*):
220
+ Sequence length of the input sequences including padding tokens. Used to pad the output tensors.
221
+ """ # noqa: E501
222
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
223
+ output_hidden_states = (
224
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
225
+ )
226
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
227
+
228
+ if (input_ids is None) ^ (inputs_embeds is not None):
229
+ raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
230
+
231
+ all_hidden_states = () if output_hidden_states else None
232
+ all_self_attentions = () if output_attentions else None
233
+
234
+ self._maybe_set_compile()
235
+
236
+ if input_ids is not None:
237
+ self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
238
+
239
+ if batch_size is None and seq_len is None:
240
+ if inputs_embeds is not None:
241
+ batch_size, seq_len = inputs_embeds.shape[:2]
242
+ else:
243
+ batch_size, seq_len = input_ids.shape[:2] # type: ignore
244
+ device = input_ids.device if input_ids is not None else inputs_embeds.device # type: ignore
245
+
246
+ if attention_mask is None:
247
+ attention_mask = torch.ones((batch_size, seq_len), device=device, dtype=torch.bool) # type: ignore
248
+
249
+ repad = False
250
+ if self.config._attn_implementation == "flash_attention_2":
251
+ if indices is None and cu_seqlens is None and max_seqlen is None:
252
+ repad = True
253
+ if inputs_embeds is None:
254
+ with torch.no_grad():
255
+ input_ids, indices, cu_seqlens, max_seqlen, *_ = _unpad_modernbert_input(
256
+ inputs=input_ids, # type: ignore
257
+ attention_mask=attention_mask, # type: ignore
258
+ )
259
+ else:
260
+ inputs_embeds, indices, cu_seqlens, max_seqlen, *_ = _unpad_modernbert_input(
261
+ inputs=inputs_embeds,
262
+ attention_mask=attention_mask, # type: ignore
263
+ )
264
+ else:
265
+ if position_ids is None:
266
+ position_ids = torch.arange(seq_len, device=device).unsqueeze(0) # type: ignore
267
+
268
+ attention_mask, sliding_window_mask = self._update_attention_mask(
269
+ attention_mask, # type: ignore
270
+ output_attentions=output_attentions, # type: ignore
271
+ )
272
+
273
+ hidden_states = self.embeddings(input_ids=input_ids, inputs_embeds=inputs_embeds)
274
+
275
+ for encoder_layer in self.layers:
276
+ if output_hidden_states:
277
+ all_hidden_states = all_hidden_states + (hidden_states,) # type: ignore
278
+
279
+ layer_outputs = encoder_layer(
280
+ hidden_states,
281
+ attention_mask=attention_mask,
282
+ sliding_window_mask=sliding_window_mask,
283
+ position_ids=position_ids,
284
+ cu_seqlens=cu_seqlens,
285
+ max_seqlen=max_seqlen,
286
+ output_attentions=output_attentions,
287
+ )
288
+ hidden_states = layer_outputs[0]
289
+ if output_attentions and len(layer_outputs) > 1:
290
+ all_self_attentions = all_self_attentions + (layer_outputs[1],) # type: ignore
291
+
292
+ if output_hidden_states:
293
+ all_hidden_states = all_hidden_states + (hidden_states,) # type: ignore
294
+
295
+ hidden_states = self.final_norm(hidden_states)
296
+
297
+ if repad:
298
+ hidden_states = _pad_modernbert_output(
299
+ inputs=hidden_states,
300
+ indices=indices, # type: ignore
301
+ batch=batch_size, # type: ignore
302
+ seqlen=seq_len, # type: ignore
303
+ )
304
+ if all_hidden_states is not None:
305
+ all_hidden_states = tuple(
306
+ _pad_modernbert_output(inputs=hs, indices=indices, batch=batch_size, seqlen=seq_len) # type: ignore
307
+ for hs in all_hidden_states
308
+ )
309
+
310
+ if not return_dict:
311
+ return tuple(v for v in [hidden_states, all_hidden_states, all_self_attentions] if v is not None)
312
+ return BaseModelOutput(
313
+ last_hidden_state=hidden_states, # type: ignore
314
+ hidden_states=all_hidden_states, # type: ignore
315
+ attentions=all_self_attentions,
316
+ )
317
+
318
+ def _update_attention_mask(self, attention_mask: torch.Tensor, output_attentions: bool) -> torch.Tensor:
319
+ if output_attentions:
320
+ if self.config._attn_implementation == "sdpa":
321
+ logger.warning_once( # type: ignore
322
+ "Outputting attentions is only supported with the 'eager' attention implementation, "
323
+ 'not with "sdpa". Falling back to `attn_implementation="eager"`.'
324
+ )
325
+ self.config._attn_implementation = "eager"
326
+ elif self.config._attn_implementation != "eager":
327
+ logger.warning_once( # type: ignore
328
+ "Outputting attentions is only supported with the eager attention implementation, "
329
+ f'not with {self.config._attn_implementation}. Consider setting `attn_implementation="eager"`.'
330
+ " Setting `output_attentions=False`."
331
+ )
332
+
333
+ global_attention_mask = _prepare_4d_attention_mask(attention_mask, self.dtype)
334
+
335
+ # Create position indices
336
+ rows = torch.arange(global_attention_mask.shape[2]).unsqueeze(0)
337
+ # Calculate distance between positions
338
+ distance = torch.abs(rows - rows.T)
339
+
340
+ # Create sliding window mask (1 for positions within window, 0 outside)
341
+ window_mask = (distance <= self.config.local_attention // 2).unsqueeze(0).unsqueeze(0).to(attention_mask.device)
342
+ # Combine with existing mask
343
+ sliding_window_mask = global_attention_mask.masked_fill(window_mask.logical_not(), torch.finfo(self.dtype).min)
344
+
345
+ return global_attention_mask, sliding_window_mask # type: ignore
346
+
347
+
348
+ class ModChemBertForMaskedLM(InitWeightsMixin, ModernBertPreTrainedModel):
349
+ config_class = ModChemBertConfig
350
+ _tied_weights_keys = ["decoder.weight"]
351
+
352
+ def __init__(self, config: ModChemBertConfig):
353
+ super().__init__(config)
354
+ self.config = config
355
+ self.model = ModChemBertModel(config)
356
+ self.head = ModernBertPredictionHead(config)
357
+ self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=config.decoder_bias)
358
+
359
+ self.sparse_prediction = self.config.sparse_prediction
360
+ self.sparse_pred_ignore_index = self.config.sparse_pred_ignore_index
361
+
362
+ # Initialize weights and apply final processing
363
+ self.post_init()
364
+
365
+ def get_output_embeddings(self):
366
+ return self.decoder
367
+
368
+ def set_output_embeddings(self, new_embeddings: nn.Linear):
369
+ self.decoder = new_embeddings
370
+
371
+ @torch.compile(dynamic=True)
372
+ def compiled_head(self, output: torch.Tensor) -> torch.Tensor:
373
+ return self.decoder(self.head(output))
374
+
375
+ def forward(
376
+ self,
377
+ input_ids: torch.LongTensor | None = None,
378
+ attention_mask: torch.Tensor | None = None,
379
+ sliding_window_mask: torch.Tensor | None = None,
380
+ position_ids: torch.Tensor | None = None,
381
+ inputs_embeds: torch.Tensor | None = None,
382
+ labels: torch.Tensor | None = None,
383
+ indices: torch.Tensor | None = None,
384
+ cu_seqlens: torch.Tensor | None = None,
385
+ max_seqlen: int | None = None,
386
+ batch_size: int | None = None,
387
+ seq_len: int | None = None,
388
+ output_attentions: bool | None = None,
389
+ output_hidden_states: bool | None = None,
390
+ return_dict: bool | None = None,
391
+ **kwargs,
392
+ ) -> tuple[torch.Tensor] | tuple[torch.Tensor, typing.Any] | MaskedLMOutput:
393
+ r"""
394
+ sliding_window_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
395
+ Mask to avoid performing attention on padding or far-away tokens. In ModernBert, only every few layers
396
+ perform global attention, while the rest perform local attention. This mask is used to avoid attending to
397
+ far-away tokens in the local attention layers when not using Flash Attention.
398
+ indices (`torch.Tensor` of shape `(total_unpadded_tokens,)`, *optional*):
399
+ Indices of the non-padding tokens in the input sequence. Used for unpadding the output.
400
+ cu_seqlens (`torch.Tensor` of shape `(batch + 1,)`, *optional*):
401
+ Cumulative sequence lengths of the input sequences. Used to index the unpadded tensors.
402
+ max_seqlen (`int`, *optional*):
403
+ Maximum sequence length in the batch excluding padding tokens. Used to unpad input_ids & pad output tensors.
404
+ batch_size (`int`, *optional*):
405
+ Batch size of the input sequences. Used to pad the output tensors.
406
+ seq_len (`int`, *optional*):
407
+ Sequence length of the input sequences including padding tokens. Used to pad the output tensors.
408
+ """
409
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
410
+ self._maybe_set_compile()
411
+
412
+ if self.config._attn_implementation == "flash_attention_2": # noqa: SIM102
413
+ if indices is None and cu_seqlens is None and max_seqlen is None:
414
+ if batch_size is None and seq_len is None:
415
+ if inputs_embeds is not None:
416
+ batch_size, seq_len = inputs_embeds.shape[:2]
417
+ else:
418
+ batch_size, seq_len = input_ids.shape[:2] # type: ignore
419
+ device = input_ids.device if input_ids is not None else inputs_embeds.device # type: ignore
420
+
421
+ if attention_mask is None:
422
+ attention_mask = torch.ones((batch_size, seq_len), device=device, dtype=torch.bool) # type: ignore
423
+
424
+ if inputs_embeds is None:
425
+ with torch.no_grad():
426
+ input_ids, indices, cu_seqlens, max_seqlen, position_ids, labels = _unpad_modernbert_input(
427
+ inputs=input_ids, # type: ignore
428
+ attention_mask=attention_mask, # type: ignore
429
+ position_ids=position_ids,
430
+ labels=labels,
431
+ )
432
+ else:
433
+ inputs_embeds, indices, cu_seqlens, max_seqlen, position_ids, labels = _unpad_modernbert_input(
434
+ inputs=inputs_embeds,
435
+ attention_mask=attention_mask, # type: ignore
436
+ position_ids=position_ids,
437
+ labels=labels,
438
+ )
439
+
440
+ outputs = self.model(
441
+ input_ids=input_ids,
442
+ attention_mask=attention_mask,
443
+ sliding_window_mask=sliding_window_mask,
444
+ position_ids=position_ids,
445
+ inputs_embeds=inputs_embeds,
446
+ indices=indices,
447
+ cu_seqlens=cu_seqlens,
448
+ max_seqlen=max_seqlen,
449
+ batch_size=batch_size,
450
+ seq_len=seq_len,
451
+ output_attentions=output_attentions,
452
+ output_hidden_states=output_hidden_states,
453
+ return_dict=return_dict,
454
+ )
455
+ last_hidden_state = outputs[0]
456
+
457
+ if self.sparse_prediction and labels is not None:
458
+ # flatten labels and output first
459
+ labels = labels.view(-1)
460
+ last_hidden_state = last_hidden_state.view(labels.shape[0], -1)
461
+
462
+ # then filter out the non-masked tokens
463
+ mask_tokens = labels != self.sparse_pred_ignore_index
464
+ last_hidden_state = last_hidden_state[mask_tokens]
465
+ labels = labels[mask_tokens]
466
+
467
+ logits = (
468
+ self.compiled_head(last_hidden_state)
469
+ if self.config.reference_compile
470
+ else self.decoder(self.head(last_hidden_state))
471
+ )
472
+
473
+ loss = None
474
+ if labels is not None:
475
+ loss = self.loss_function(logits, labels, vocab_size=self.config.vocab_size, **kwargs)
476
+
477
+ if self.config._attn_implementation == "flash_attention_2":
478
+ with nullcontext() if self.config.repad_logits_with_grad or labels is None else torch.no_grad():
479
+ logits = _pad_modernbert_output(inputs=logits, indices=indices, batch=batch_size, seqlen=seq_len) # type: ignore
480
+
481
+ if not return_dict:
482
+ output = (logits,)
483
+ return ((loss,) + output) if loss is not None else output
484
+
485
+ return MaskedLMOutput(
486
+ loss=loss,
487
+ logits=typing.cast(torch.FloatTensor, logits),
488
+ hidden_states=outputs.hidden_states,
489
+ attentions=outputs.attentions,
490
+ )
491
+
492
+
493
+ class ModChemBertForSequenceClassification(InitWeightsMixin, ModernBertPreTrainedModel):
494
+ config_class = ModChemBertConfig
495
+
496
+ def __init__(self, config: ModChemBertConfig):
497
+ super().__init__(config)
498
+ self.num_labels = config.num_labels
499
+ self.config = config
500
+
501
+ self.model = ModernBertModel(config)
502
+ if self.config.classifier_pooling in {"cls_mha", "max_seq_mha"}:
503
+ self.pooling_attn = ModChemBertPoolingAttention(config=self.config)
504
+ else:
505
+ self.pooling_attn = None
506
+ self.head = ModernBertPredictionHead(config)
507
+ self.drop = torch.nn.Dropout(config.classifier_dropout)
508
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
509
+
510
+ # Initialize weights and apply final processing
511
+ self.post_init()
512
+
513
+ def forward(
514
+ self,
515
+ input_ids: torch.LongTensor | None = None,
516
+ attention_mask: torch.Tensor | None = None,
517
+ sliding_window_mask: torch.Tensor | None = None,
518
+ position_ids: torch.Tensor | None = None,
519
+ inputs_embeds: torch.Tensor | None = None,
520
+ labels: torch.Tensor | None = None,
521
+ indices: torch.Tensor | None = None,
522
+ cu_seqlens: torch.Tensor | None = None,
523
+ max_seqlen: int | None = None,
524
+ batch_size: int | None = None,
525
+ seq_len: int | None = None,
526
+ output_attentions: bool | None = None,
527
+ output_hidden_states: bool | None = None,
528
+ return_dict: bool | None = None,
529
+ **kwargs,
530
+ ) -> tuple[torch.Tensor] | tuple[torch.Tensor, typing.Any] | SequenceClassifierOutput:
531
+ r"""
532
+ sliding_window_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
533
+ Mask to avoid performing attention on padding or far-away tokens. In ModernBert, only every few layers
534
+ perform global attention, while the rest perform local attention. This mask is used to avoid attending to
535
+ far-away tokens in the local attention layers when not using Flash Attention.
536
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
537
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
538
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
539
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
540
+ indices (`torch.Tensor` of shape `(total_unpadded_tokens,)`, *optional*):
541
+ Indices of the non-padding tokens in the input sequence. Used for unpadding the output.
542
+ cu_seqlens (`torch.Tensor` of shape `(batch + 1,)`, *optional*):
543
+ Cumulative sequence lengths of the input sequences. Used to index the unpadded tensors.
544
+ max_seqlen (`int`, *optional*):
545
+ Maximum sequence length in the batch excluding padding tokens. Used to unpad input_ids & pad output tensors.
546
+ batch_size (`int`, *optional*):
547
+ Batch size of the input sequences. Used to pad the output tensors.
548
+ seq_len (`int`, *optional*):
549
+ Sequence length of the input sequences including padding tokens. Used to pad the output tensors.
550
+ """
551
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
552
+ self._maybe_set_compile()
553
+
554
+ if input_ids is not None:
555
+ self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
556
+
557
+ if batch_size is None and seq_len is None:
558
+ if inputs_embeds is not None:
559
+ batch_size, seq_len = inputs_embeds.shape[:2]
560
+ else:
561
+ batch_size, seq_len = input_ids.shape[:2] # type: ignore
562
+ device = input_ids.device if input_ids is not None else inputs_embeds.device # type: ignore
563
+
564
+ if attention_mask is None:
565
+ attention_mask = torch.ones((batch_size, seq_len), device=device, dtype=torch.bool) # type: ignore
566
+
567
+ # Ensure output_hidden_states is True in case pooling mode requires all hidden states
568
+ output_hidden_states = True
569
+
570
+ outputs = self.model(
571
+ input_ids=input_ids,
572
+ attention_mask=attention_mask,
573
+ sliding_window_mask=sliding_window_mask,
574
+ position_ids=position_ids,
575
+ inputs_embeds=inputs_embeds,
576
+ indices=indices,
577
+ cu_seqlens=cu_seqlens,
578
+ max_seqlen=max_seqlen,
579
+ batch_size=batch_size,
580
+ seq_len=seq_len,
581
+ output_attentions=output_attentions,
582
+ output_hidden_states=output_hidden_states,
583
+ return_dict=return_dict,
584
+ )
585
+ last_hidden_state = outputs[0]
586
+ hidden_states = outputs[1]
587
+
588
+ last_hidden_state = _pool_modchembert_output(
589
+ self,
590
+ last_hidden_state,
591
+ hidden_states,
592
+ typing.cast(torch.Tensor, attention_mask),
593
+ )
594
+ pooled_output = self.head(last_hidden_state)
595
+ pooled_output = self.drop(pooled_output)
596
+ logits = self.classifier(pooled_output)
597
+
598
+ loss = None
599
+ if labels is not None:
600
+ if self.config.problem_type is None:
601
+ if self.num_labels == 1:
602
+ self.config.problem_type = "regression"
603
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
604
+ self.config.problem_type = "single_label_classification"
605
+ else:
606
+ self.config.problem_type = "multi_label_classification"
607
+
608
+ if self.config.problem_type == "regression":
609
+ loss_fct = MSELoss()
610
+ if self.num_labels == 1:
611
+ loss = loss_fct(logits.squeeze(), labels.squeeze())
612
+ else:
613
+ loss = loss_fct(logits, labels)
614
+ elif self.config.problem_type == "single_label_classification":
615
+ loss_fct = CrossEntropyLoss()
616
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
617
+ elif self.config.problem_type == "multi_label_classification":
618
+ loss_fct = BCEWithLogitsLoss()
619
+ loss = loss_fct(logits, labels)
620
+
621
+ if not return_dict:
622
+ output = (logits,)
623
+ return ((loss,) + output) if loss is not None else output
624
+
625
+ return SequenceClassifierOutput(
626
+ loss=loss,
627
+ logits=logits,
628
+ hidden_states=outputs.hidden_states,
629
+ attentions=outputs.attentions,
630
+ )
631
+
632
+
633
+ def _pool_modchembert_output(
634
+ module: ModChemBertForSequenceClassification,
635
+ last_hidden_state: torch.Tensor,
636
+ hidden_states: list[torch.Tensor],
637
+ attention_mask: torch.Tensor,
638
+ ):
639
+ """
640
+ Apply pooling strategy to hidden states for sequence-level classification/regression tasks.
641
+
642
+ This function implements various pooling strategies to aggregate sequence representations
643
+ into a single vector for downstream classification or regression tasks. The pooling method
644
+ is determined by the `classifier_pooling` configuration parameter.
645
+
646
+ Available pooling strategies:
647
+ - cls: Use the CLS token ([CLS]) representation from the last hidden state
648
+ - mean: Average pooling over all tokens in the sequence (attention-weighted)
649
+ - max_cls: Element-wise max pooling over the last k hidden states, then take CLS token
650
+ - cls_mha: Multi-head attention with CLS token as query and full sequence as keys/values
651
+ - max_seq_mha: Max pooling over last k states + multi-head attention with CLS as query
652
+ - max_seq_mean: Max pooling over last k hidden states, then mean pooling over sequence
653
+ - sum_mean: Sum all hidden states across layers, then mean pool over sequence
654
+ - sum_sum: Sum all hidden states across layers, then sum pool over sequence
655
+ - mean_sum: Mean all hidden states across layers, then sum pool over sequence
656
+ - mean_mean: Mean all hidden states across layers, then mean pool over sequence
657
+
658
+ Args:
659
+ module: The model instance containing configuration and pooling attention if needed
660
+ last_hidden_state: Final layer hidden states of shape (batch_size, seq_len, hidden_size)
661
+ hidden_states: List of hidden states from all layers, each of shape (batch_size, seq_len, hidden_size)
662
+ attention_mask: Attention mask of shape (batch_size, seq_len) indicating valid tokens
663
+
664
+ Returns:
665
+ torch.Tensor: Pooled representation of shape (batch_size, hidden_size)
666
+
667
+ Note:
668
+ Some pooling strategies (cls_mha, max_seq_mha) require the module to have a pooling_attn
669
+ attribute containing a ModChemBertPoolingAttention instance.
670
+ """
671
+ config = typing.cast(ModChemBertConfig, module.config)
672
+ if config.classifier_pooling == "cls":
673
+ last_hidden_state = last_hidden_state[:, 0]
674
+ elif config.classifier_pooling == "mean":
675
+ last_hidden_state = (last_hidden_state * attention_mask.unsqueeze(-1)).sum(dim=1) / attention_mask.sum(
676
+ dim=1, keepdim=True
677
+ )
678
+ elif config.classifier_pooling == "max_cls":
679
+ k_hidden_states = hidden_states[-config.classifier_pooling_last_k :]
680
+ theta = torch.stack(k_hidden_states, dim=1) # (batch, k, seq_len, hidden)
681
+ pooled_seq = torch.max(theta, dim=1).values # Element-wise max over k -> (batch, seq_len, hidden)
682
+ last_hidden_state = pooled_seq[:, 0, :] # (batch, hidden)
683
+ elif config.classifier_pooling == "cls_mha":
684
+ # Similar to max_seq_mha but without the max pooling step
685
+ # Query is CLS token (position 0); Keys/Values are full sequence
686
+ q = last_hidden_state[:, 0, :].unsqueeze(1) # (batch, 1, hidden)
687
+ q = q.expand(-1, last_hidden_state.shape[1], -1) # (batch, seq_len, hidden)
688
+ attn_out: torch.Tensor = module.pooling_attn( # type: ignore
689
+ q=q, kv=last_hidden_state, attention_mask=attention_mask
690
+ ) # (batch, seq_len, hidden)
691
+ last_hidden_state = torch.mean(attn_out, dim=1)
692
+ elif config.classifier_pooling == "max_seq_mha":
693
+ k_hidden_states = hidden_states[-config.classifier_pooling_last_k :]
694
+ theta = torch.stack(k_hidden_states, dim=1) # (batch, k, seq_len, hidden)
695
+ pooled_seq = torch.max(theta, dim=1).values # Element-wise max over k -> (batch, seq_len, hidden)
696
+ # Query is pooled CLS token (position 0); Keys/Values are pooled sequence
697
+ q = pooled_seq[:, 0, :].unsqueeze(1) # (batch, 1, hidden)
698
+ q = q.expand(-1, pooled_seq.shape[1], -1) # (batch, seq_len, hidden)
699
+ attn_out: torch.Tensor = module.pooling_attn( # type: ignore
700
+ q=q, kv=pooled_seq, attention_mask=attention_mask
701
+ ) # (batch, seq_len, hidden)
702
+ last_hidden_state = torch.mean(attn_out, dim=1)
703
+ elif config.classifier_pooling == "max_seq_mean":
704
+ k_hidden_states = hidden_states[-config.classifier_pooling_last_k :]
705
+ theta = torch.stack(k_hidden_states, dim=1) # (batch, k, seq_len, hidden)
706
+ pooled_seq = torch.max(theta, dim=1).values # Element-wise max over k -> (batch, seq_len, hidden)
707
+ last_hidden_state = torch.mean(pooled_seq, dim=1) # Mean over sequence length
708
+ elif config.classifier_pooling == "sum_mean":
709
+ # ChemLM uses the mean of all hidden states
710
+ # which outperforms using just the last layer mean or the cls embedding
711
+ # https://doi.org/10.1038/s42004-025-01484-4
712
+ # https://static-content.springer.com/esm/art%3A10.1038%2Fs42004-025-01484-4/MediaObjects/42004_2025_1484_MOESM2_ESM.pdf
713
+ all_hidden_states = torch.stack(hidden_states)
714
+ w = torch.sum(all_hidden_states, dim=0)
715
+ last_hidden_state = torch.mean(w, dim=1)
716
+ elif config.classifier_pooling == "sum_sum":
717
+ all_hidden_states = torch.stack(hidden_states)
718
+ w = torch.sum(all_hidden_states, dim=0)
719
+ last_hidden_state = torch.sum(w, dim=1)
720
+ elif config.classifier_pooling == "mean_sum":
721
+ all_hidden_states = torch.stack(hidden_states)
722
+ w = torch.mean(all_hidden_states, dim=0)
723
+ last_hidden_state = torch.sum(w, dim=1)
724
+ elif config.classifier_pooling == "mean_mean":
725
+ all_hidden_states = torch.stack(hidden_states)
726
+ w = torch.mean(all_hidden_states, dim=0)
727
+ last_hidden_state = torch.mean(w, dim=1)
728
+ return last_hidden_state
729
+
730
+
731
+ __all__ = [
732
+ "ModChemBertForMaskedLM",
733
+ "ModChemBertForSequenceClassification",
734
+ ]
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
@@ -0,0 +1,2542 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "[CLS]",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ },
15
+ {
16
+ "id": 1,
17
+ "content": "[SEP]",
18
+ "single_word": false,
19
+ "lstrip": false,
20
+ "rstrip": false,
21
+ "normalized": false,
22
+ "special": true
23
+ },
24
+ {
25
+ "id": 2,
26
+ "content": "[PAD]",
27
+ "single_word": false,
28
+ "lstrip": false,
29
+ "rstrip": false,
30
+ "normalized": false,
31
+ "special": true
32
+ },
33
+ {
34
+ "id": 3,
35
+ "content": "[MASK]",
36
+ "single_word": false,
37
+ "lstrip": true,
38
+ "rstrip": false,
39
+ "normalized": false,
40
+ "special": true
41
+ },
42
+ {
43
+ "id": 2361,
44
+ "content": "[UNK]",
45
+ "single_word": false,
46
+ "lstrip": false,
47
+ "rstrip": false,
48
+ "normalized": false,
49
+ "special": true
50
+ }
51
+ ],
52
+ "normalizer": null,
53
+ "pre_tokenizer": {
54
+ "type": "ByteLevel",
55
+ "add_prefix_space": false,
56
+ "trim_offsets": true,
57
+ "use_regex": true
58
+ },
59
+ "post_processor": {
60
+ "type": "TemplateProcessing",
61
+ "single": [
62
+ {
63
+ "SpecialToken": {
64
+ "id": "[CLS]",
65
+ "type_id": 0
66
+ }
67
+ },
68
+ {
69
+ "Sequence": {
70
+ "id": "A",
71
+ "type_id": 0
72
+ }
73
+ },
74
+ {
75
+ "SpecialToken": {
76
+ "id": "[SEP]",
77
+ "type_id": 0
78
+ }
79
+ }
80
+ ],
81
+ "pair": [
82
+ {
83
+ "SpecialToken": {
84
+ "id": "[CLS]",
85
+ "type_id": 0
86
+ }
87
+ },
88
+ {
89
+ "Sequence": {
90
+ "id": "A",
91
+ "type_id": 0
92
+ }
93
+ },
94
+ {
95
+ "SpecialToken": {
96
+ "id": "[SEP]",
97
+ "type_id": 0
98
+ }
99
+ },
100
+ {
101
+ "Sequence": {
102
+ "id": "B",
103
+ "type_id": 0
104
+ }
105
+ },
106
+ {
107
+ "SpecialToken": {
108
+ "id": "[SEP]",
109
+ "type_id": 0
110
+ }
111
+ }
112
+ ],
113
+ "special_tokens": {
114
+ "[CLS]": {
115
+ "id": "[CLS]",
116
+ "ids": [
117
+ 0
118
+ ],
119
+ "tokens": [
120
+ "[CLS]"
121
+ ]
122
+ },
123
+ "[MASK]": {
124
+ "id": "[MASK]",
125
+ "ids": [
126
+ 3
127
+ ],
128
+ "tokens": [
129
+ "[MASK]"
130
+ ]
131
+ },
132
+ "[PAD]": {
133
+ "id": "[PAD]",
134
+ "ids": [
135
+ 2
136
+ ],
137
+ "tokens": [
138
+ "[PAD]"
139
+ ]
140
+ },
141
+ "[SEP]": {
142
+ "id": "[SEP]",
143
+ "ids": [
144
+ 1
145
+ ],
146
+ "tokens": [
147
+ "[SEP]"
148
+ ]
149
+ },
150
+ "[UNK]": {
151
+ "id": "[UNK]",
152
+ "ids": [
153
+ 2361
154
+ ],
155
+ "tokens": [
156
+ "[UNK]"
157
+ ]
158
+ }
159
+ }
160
+ },
161
+ "decoder": {
162
+ "type": "ByteLevel",
163
+ "add_prefix_space": false,
164
+ "trim_offsets": true,
165
+ "use_regex": true
166
+ },
167
+ "model": {
168
+ "type": "BPE",
169
+ "dropout": null,
170
+ "unk_token": "[UNK]",
171
+ "continuing_subword_prefix": null,
172
+ "end_of_word_suffix": null,
173
+ "fuse_unk": false,
174
+ "byte_fallback": false,
175
+ "ignore_merges": false,
176
+ "vocab": {
177
+ "[CLS]": 0,
178
+ "[SEP]": 1,
179
+ "[PAD]": 2,
180
+ "[MASK]": 3,
181
+ "C": 4,
182
+ "c": 5,
183
+ "(": 6,
184
+ ")": 7,
185
+ "1": 8,
186
+ "O": 9,
187
+ "N": 10,
188
+ "2": 11,
189
+ "=": 12,
190
+ "n": 13,
191
+ "3": 14,
192
+ "[C@H]": 15,
193
+ "[C@@H]": 16,
194
+ "F": 17,
195
+ "S": 18,
196
+ "4": 19,
197
+ "Cl": 20,
198
+ "-": 21,
199
+ "o": 22,
200
+ "s": 23,
201
+ "[nH]": 24,
202
+ "#": 25,
203
+ "/": 26,
204
+ "Br": 27,
205
+ "[C@]": 28,
206
+ "[C@@]": 29,
207
+ "[N+]": 30,
208
+ "[O-]": 31,
209
+ "5": 32,
210
+ "\\": 33,
211
+ ".": 34,
212
+ "I": 35,
213
+ "6": 36,
214
+ "[S@]": 37,
215
+ "[S@@]": 38,
216
+ "P": 39,
217
+ "[N-]": 40,
218
+ "[Si]": 41,
219
+ "7": 42,
220
+ "[n+]": 43,
221
+ "[2H]": 44,
222
+ "8": 45,
223
+ "[NH+]": 46,
224
+ "B": 47,
225
+ "9": 48,
226
+ "[C-]": 49,
227
+ "[Na+]": 50,
228
+ "[Cl-]": 51,
229
+ "[c-]": 52,
230
+ "[CH]": 53,
231
+ "%10": 54,
232
+ "[NH2+]": 55,
233
+ "[P+]": 56,
234
+ "[B]": 57,
235
+ "[I-]": 58,
236
+ "%11": 59,
237
+ "[CH2-]": 60,
238
+ "[O+]": 61,
239
+ "[NH3+]": 62,
240
+ "[C]": 63,
241
+ "[Br-]": 64,
242
+ "[IH2]": 65,
243
+ "[S-]": 66,
244
+ "[cH-]": 67,
245
+ "%12": 68,
246
+ "[nH+]": 69,
247
+ "[B-]": 70,
248
+ "[K+]": 71,
249
+ "[Sn]": 72,
250
+ "[Se]": 73,
251
+ "[CH-]": 74,
252
+ "[HH]": 75,
253
+ "[Y]": 76,
254
+ "[n-]": 77,
255
+ "[CH3-]": 78,
256
+ "[SiH]": 79,
257
+ "[S+]": 80,
258
+ "%13": 81,
259
+ "[SiH2]": 82,
260
+ "[Li+]": 83,
261
+ "[NH-]": 84,
262
+ "%14": 85,
263
+ "[Na]": 86,
264
+ "[CH2]": 87,
265
+ "[O-2]": 88,
266
+ "[U+2]": 89,
267
+ "[W]": 90,
268
+ "[Al]": 91,
269
+ "[P@]": 92,
270
+ "[Fe+2]": 93,
271
+ "[PH+]": 94,
272
+ "%15": 95,
273
+ "[Cl+3]": 96,
274
+ "[Zn+2]": 97,
275
+ "[Ir]": 98,
276
+ "[Mg+2]": 99,
277
+ "[Pt+2]": 100,
278
+ "[OH2+]": 101,
279
+ "[As]": 102,
280
+ "[Fe]": 103,
281
+ "[OH+]": 104,
282
+ "[Zr+2]": 105,
283
+ "[3H]": 106,
284
+ "[Ge]": 107,
285
+ "[SiH3]": 108,
286
+ "[OH-]": 109,
287
+ "[NH4+]": 110,
288
+ "[Cu+2]": 111,
289
+ "[P@@]": 112,
290
+ "p": 113,
291
+ "[Pt]": 114,
292
+ "%16": 115,
293
+ "[Ca+2]": 116,
294
+ "[Zr]": 117,
295
+ "[F-]": 118,
296
+ "[C+]": 119,
297
+ "[Ti]": 120,
298
+ "[P-]": 121,
299
+ "[V]": 122,
300
+ "[se]": 123,
301
+ "[U]": 124,
302
+ "[O]": 125,
303
+ "[Ni+2]": 126,
304
+ "[Zn]": 127,
305
+ "[Co]": 128,
306
+ "[Ni]": 129,
307
+ "[Pd+2]": 130,
308
+ "[Cu]": 131,
309
+ "%17": 132,
310
+ "[Cu+]": 133,
311
+ "[Te]": 134,
312
+ "[H+]": 135,
313
+ "[CH+]": 136,
314
+ "[Li]": 137,
315
+ "[Pd]": 138,
316
+ "[Mo]": 139,
317
+ "[Ru+2]": 140,
318
+ "[o+]": 141,
319
+ "[Re]": 142,
320
+ "[SH+]": 143,
321
+ "%18": 144,
322
+ "[Ac]": 145,
323
+ "[Cr]": 146,
324
+ "[NH2-]": 147,
325
+ "[K]": 148,
326
+ "[13CH2]": 149,
327
+ "[c]": 150,
328
+ "[Zr+4]": 151,
329
+ "[Tl]": 152,
330
+ "[13C]": 153,
331
+ "[Mn]": 154,
332
+ "[N@+]": 155,
333
+ "[Hg]": 156,
334
+ "[Rh]": 157,
335
+ "[Ti+4]": 158,
336
+ "[Sb]": 159,
337
+ "[Co+2]": 160,
338
+ "[Ag+]": 161,
339
+ "[Ru]": 162,
340
+ "%19": 163,
341
+ "[N@@+]": 164,
342
+ "[Ti+2]": 165,
343
+ "[Al+3]": 166,
344
+ "[Pb]": 167,
345
+ "[I+]": 168,
346
+ "[18F]": 169,
347
+ "[s+]": 170,
348
+ "[Rb+]": 171,
349
+ "[Ba+2]": 172,
350
+ "[H-]": 173,
351
+ "[Fe+3]": 174,
352
+ "[Ir+3]": 175,
353
+ "[13cH]": 176,
354
+ "%20": 177,
355
+ "[AlH2]": 178,
356
+ "[Au+]": 179,
357
+ "[13c]": 180,
358
+ "[SH2+]": 181,
359
+ "[Sn+2]": 182,
360
+ "[Mn+2]": 183,
361
+ "[Si-]": 184,
362
+ "[Ag]": 185,
363
+ "[N]": 186,
364
+ "[Bi]": 187,
365
+ "%21": 188,
366
+ "[In]": 189,
367
+ "[CH2+]": 190,
368
+ "[Y+3]": 191,
369
+ "[Ga]": 192,
370
+ "%22": 193,
371
+ "[Co+3]": 194,
372
+ "[Au]": 195,
373
+ "[13CH3]": 196,
374
+ "[Mg]": 197,
375
+ "[Cs+]": 198,
376
+ "[W+2]": 199,
377
+ "[Hf]": 200,
378
+ "[Zn+]": 201,
379
+ "[Se-]": 202,
380
+ "[S-2]": 203,
381
+ "[Ca]": 204,
382
+ "[pH]": 205,
383
+ "[ClH+]": 206,
384
+ "[Ti+3]": 207,
385
+ "%23": 208,
386
+ "[Ru+]": 209,
387
+ "[SH-]": 210,
388
+ "[13CH]": 211,
389
+ "[IH+]": 212,
390
+ "[Hf+4]": 213,
391
+ "[Rf]": 214,
392
+ "[OH3+]": 215,
393
+ "%24": 216,
394
+ "[Pt+4]": 217,
395
+ "[Zr+3]": 218,
396
+ "[PH3+]": 219,
397
+ "[Sr+2]": 220,
398
+ "[Cd+2]": 221,
399
+ "[Cd]": 222,
400
+ "%25": 223,
401
+ "[Os]": 224,
402
+ "[BH-]": 225,
403
+ "[Sn+4]": 226,
404
+ "[Cr+3]": 227,
405
+ "[Ru+3]": 228,
406
+ "[PH2+]": 229,
407
+ "[Rh+2]": 230,
408
+ "[V+2]": 231,
409
+ "%26": 232,
410
+ "[Gd+3]": 233,
411
+ "[Pb+2]": 234,
412
+ "[PH]": 235,
413
+ "[Hg+]": 236,
414
+ "[Mo+2]": 237,
415
+ "[AlH]": 238,
416
+ "[Sn+]": 239,
417
+ "%27": 240,
418
+ "[Pd+]": 241,
419
+ "b": 242,
420
+ "[Rh+3]": 243,
421
+ "[Hg+2]": 244,
422
+ "[15NH]": 245,
423
+ "[14C]": 246,
424
+ "%28": 247,
425
+ "[Mn+3]": 248,
426
+ "[Si+]": 249,
427
+ "[SeH]": 250,
428
+ "[13C@H]": 251,
429
+ "[NH]": 252,
430
+ "[Ga+3]": 253,
431
+ "[SiH-]": 254,
432
+ "[13C@@H]": 255,
433
+ "[Ce]": 256,
434
+ "[Au+3]": 257,
435
+ "[Bi+3]": 258,
436
+ "[15N]": 259,
437
+ "%29": 260,
438
+ "[BH3-]": 261,
439
+ "[14cH]": 262,
440
+ "[Ti+]": 263,
441
+ "[Gd]": 264,
442
+ "[cH+]": 265,
443
+ "[Cr+2]": 266,
444
+ "[Sb-]": 267,
445
+ "%30": 268,
446
+ "[Be+2]": 269,
447
+ "[Al+]": 270,
448
+ "[te]": 271,
449
+ "[11CH3]": 272,
450
+ "[Sm]": 273,
451
+ "[Pr]": 274,
452
+ "[La]": 275,
453
+ "%31": 276,
454
+ "[Al-]": 277,
455
+ "[Ta]": 278,
456
+ "[125I]": 279,
457
+ "[BH2-]": 280,
458
+ "[Nb]": 281,
459
+ "[Si@]": 282,
460
+ "%32": 283,
461
+ "[14c]": 284,
462
+ "[Sb+3]": 285,
463
+ "[Ba]": 286,
464
+ "%33": 287,
465
+ "[Os+2]": 288,
466
+ "[Si@@]": 289,
467
+ "[La+3]": 290,
468
+ "[15n]": 291,
469
+ "[15NH2]": 292,
470
+ "[Nd+3]": 293,
471
+ "%34": 294,
472
+ "[14CH2]": 295,
473
+ "[18O]": 296,
474
+ "[Nd]": 297,
475
+ "[GeH]": 298,
476
+ "[Ni+3]": 299,
477
+ "[Eu]": 300,
478
+ "[Dy+3]": 301,
479
+ "[Sc]": 302,
480
+ "%36": 303,
481
+ "[Se-2]": 304,
482
+ "[As+]": 305,
483
+ "%35": 306,
484
+ "[AsH]": 307,
485
+ "[Tb]": 308,
486
+ "[Sb+5]": 309,
487
+ "[Se+]": 310,
488
+ "[Ce+3]": 311,
489
+ "[c+]": 312,
490
+ "[In+3]": 313,
491
+ "[SnH]": 314,
492
+ "[Mo+4]": 315,
493
+ "%37": 316,
494
+ "[V+4]": 317,
495
+ "[Eu+3]": 318,
496
+ "[Hf+2]": 319,
497
+ "%38": 320,
498
+ "[Pt+]": 321,
499
+ "[p+]": 322,
500
+ "[123I]": 323,
501
+ "[Tl+]": 324,
502
+ "[Sm+3]": 325,
503
+ "%39": 326,
504
+ "[Yb+3]": 327,
505
+ "%40": 328,
506
+ "[Yb]": 329,
507
+ "[Os+]": 330,
508
+ "%41": 331,
509
+ "[10B]": 332,
510
+ "[Sc+3]": 333,
511
+ "[Al+2]": 334,
512
+ "%42": 335,
513
+ "[Sr]": 336,
514
+ "[Tb+3]": 337,
515
+ "[Po]": 338,
516
+ "[Tc]": 339,
517
+ "[PH-]": 340,
518
+ "[AlH3]": 341,
519
+ "[Ar]": 342,
520
+ "[U+4]": 343,
521
+ "[SnH2]": 344,
522
+ "[Cl+2]": 345,
523
+ "[si]": 346,
524
+ "[Fe+]": 347,
525
+ "[14CH3]": 348,
526
+ "[U+3]": 349,
527
+ "[Cl+]": 350,
528
+ "%43": 351,
529
+ "[GeH2]": 352,
530
+ "%44": 353,
531
+ "[Er+3]": 354,
532
+ "[Mo+3]": 355,
533
+ "[I+2]": 356,
534
+ "[Fe+4]": 357,
535
+ "[99Tc]": 358,
536
+ "%45": 359,
537
+ "[11C]": 360,
538
+ "%46": 361,
539
+ "[SnH3]": 362,
540
+ "[S]": 363,
541
+ "[Te+]": 364,
542
+ "[Er]": 365,
543
+ "[Lu+3]": 366,
544
+ "[11B]": 367,
545
+ "%47": 368,
546
+ "%48": 369,
547
+ "[P]": 370,
548
+ "[Tm]": 371,
549
+ "[Th]": 372,
550
+ "[Dy]": 373,
551
+ "[Pr+3]": 374,
552
+ "[Ta+5]": 375,
553
+ "[Nb+5]": 376,
554
+ "[Rb]": 377,
555
+ "[GeH3]": 378,
556
+ "[Br+2]": 379,
557
+ "%49": 380,
558
+ "[131I]": 381,
559
+ "[Fm]": 382,
560
+ "[Cs]": 383,
561
+ "[BH4-]": 384,
562
+ "[Lu]": 385,
563
+ "[15nH]": 386,
564
+ "%50": 387,
565
+ "[Ru+6]": 388,
566
+ "[b-]": 389,
567
+ "[Ho]": 390,
568
+ "[Th+4]": 391,
569
+ "[Ru+4]": 392,
570
+ "%52": 393,
571
+ "[14CH]": 394,
572
+ "%51": 395,
573
+ "[Cr+6]": 396,
574
+ "[18OH]": 397,
575
+ "[Ho+3]": 398,
576
+ "[Ce+4]": 399,
577
+ "[Bi+2]": 400,
578
+ "[Co+]": 401,
579
+ "%53": 402,
580
+ "[Yb+2]": 403,
581
+ "[Fe+6]": 404,
582
+ "[Be]": 405,
583
+ "%54": 406,
584
+ "[SH3+]": 407,
585
+ "[Np]": 408,
586
+ "[As-]": 409,
587
+ "%55": 410,
588
+ "[14C@@H]": 411,
589
+ "[Ir+2]": 412,
590
+ "[GaH3]": 413,
591
+ "[p-]": 414,
592
+ "[GeH4]": 415,
593
+ "[Sn+3]": 416,
594
+ "[Os+4]": 417,
595
+ "%56": 418,
596
+ "[14C@H]": 419,
597
+ "[sH+]": 420,
598
+ "[19F]": 421,
599
+ "[Eu+2]": 422,
600
+ "[TlH]": 423,
601
+ "%57": 424,
602
+ "[Cr+4]": 425,
603
+ "%58": 426,
604
+ "[B@@-]": 427,
605
+ "[SiH+]": 428,
606
+ "[At]": 429,
607
+ "[Am]": 430,
608
+ "[Fe+5]": 431,
609
+ "[AsH2]": 432,
610
+ "[Si+4]": 433,
611
+ "[B@-]": 434,
612
+ "[Pu]": 435,
613
+ "[SbH]": 436,
614
+ "[P-2]": 437,
615
+ "[Tm+3]": 438,
616
+ "*": 439,
617
+ "%59": 440,
618
+ "[se+]": 441,
619
+ "%60": 442,
620
+ "[oH+]": 443,
621
+ "[1H]": 444,
622
+ "[15N+]": 445,
623
+ "[124I]": 446,
624
+ "[S@@+]": 447,
625
+ "[P-3]": 448,
626
+ "[H]": 449,
627
+ "[IH2+]": 450,
628
+ "[TeH]": 451,
629
+ "[Xe]": 452,
630
+ "[PH4+]": 453,
631
+ "[Cr+]": 454,
632
+ "[Cm]": 455,
633
+ "[I+3]": 456,
634
+ "%61": 457,
635
+ "[Nb+2]": 458,
636
+ "[Ru+5]": 459,
637
+ "%62": 460,
638
+ "[Ta+2]": 461,
639
+ "[Tc+4]": 462,
640
+ "[CH3+]": 463,
641
+ "[Pm]": 464,
642
+ "[Si@H]": 465,
643
+ "[No]": 466,
644
+ "%63": 467,
645
+ "[Cr+5]": 468,
646
+ "[Th+2]": 469,
647
+ "[Zn-2]": 470,
648
+ "[13C@]": 471,
649
+ "[Lr]": 472,
650
+ "%64": 473,
651
+ "[99Tc+3]": 474,
652
+ "%65": 475,
653
+ "[13C@@]": 476,
654
+ "%66": 477,
655
+ "[Fe-]": 478,
656
+ "[17O]": 479,
657
+ "[siH]": 480,
658
+ "[Sb+]": 481,
659
+ "[OH]": 482,
660
+ "[IH]": 483,
661
+ "[11CH2]": 484,
662
+ "[Cf]": 485,
663
+ "[SiH2+]": 486,
664
+ "[Gd+2]": 487,
665
+ "[In+]": 488,
666
+ "[Si@@H]": 489,
667
+ "[Mn+]": 490,
668
+ "[99Tc+4]": 491,
669
+ "[Ga-]": 492,
670
+ "%67": 493,
671
+ "[S@+]": 494,
672
+ "[Ge+4]": 495,
673
+ "[Tl+3]": 496,
674
+ "[16OH]": 497,
675
+ "%68": 498,
676
+ "[2H-]": 499,
677
+ "[Ra]": 500,
678
+ "[si-]": 501,
679
+ "[NiH2]": 502,
680
+ "[P@@H]": 503,
681
+ "[Rh+]": 504,
682
+ "[12C]": 505,
683
+ "[35S]": 506,
684
+ "[32P]": 507,
685
+ "[SiH2-]": 508,
686
+ "[AlH2+]": 509,
687
+ "[16O]": 510,
688
+ "%69": 511,
689
+ "[BiH]": 512,
690
+ "[BiH2]": 513,
691
+ "[Zn-]": 514,
692
+ "[BH]": 515,
693
+ "[Tc+3]": 516,
694
+ "[Ir+]": 517,
695
+ "[Ni+]": 518,
696
+ "%70": 519,
697
+ "[InH2]": 520,
698
+ "[InH]": 521,
699
+ "[Nb+3]": 522,
700
+ "[PbH]": 523,
701
+ "[Bi+]": 524,
702
+ "%71": 525,
703
+ "[As+3]": 526,
704
+ "%72": 527,
705
+ "[18O-]": 528,
706
+ "[68Ga+3]": 529,
707
+ "%73": 530,
708
+ "[Pa]": 531,
709
+ "[76Br]": 532,
710
+ "[Tc+5]": 533,
711
+ "[pH+]": 534,
712
+ "[64Cu+2]": 535,
713
+ "[Ru+8]": 536,
714
+ "%74": 537,
715
+ "[PH2-]": 538,
716
+ "[Si+2]": 539,
717
+ "[17OH]": 540,
718
+ "[RuH]": 541,
719
+ "[111In+3]": 542,
720
+ "[AlH+]": 543,
721
+ "%75": 544,
722
+ "%76": 545,
723
+ "[W+]": 546,
724
+ "[SbH2]": 547,
725
+ "[PoH]": 548,
726
+ "[Ru-]": 549,
727
+ "[XeH]": 550,
728
+ "[Tc+2]": 551,
729
+ "[13C-]": 552,
730
+ "[Br+]": 553,
731
+ "[Pt-2]": 554,
732
+ "[Es]": 555,
733
+ "[Cu-]": 556,
734
+ "[Mg+]": 557,
735
+ "[3HH]": 558,
736
+ "[P@H]": 559,
737
+ "[ClH2+]": 560,
738
+ "%77": 561,
739
+ "[SH]": 562,
740
+ "[Au-]": 563,
741
+ "[2HH]": 564,
742
+ "%78": 565,
743
+ "[Sn-]": 566,
744
+ "[11CH]": 567,
745
+ "[PdH2]": 568,
746
+ "0": 569,
747
+ "[Os+6]": 570,
748
+ "%79": 571,
749
+ "[Mo+]": 572,
750
+ "%80": 573,
751
+ "[al]": 574,
752
+ "[PbH2]": 575,
753
+ "[64Cu]": 576,
754
+ "[Cl]": 577,
755
+ "[12CH3]": 578,
756
+ "%81": 579,
757
+ "[Tc+7]": 580,
758
+ "[11c]": 581,
759
+ "%82": 582,
760
+ "[Li-]": 583,
761
+ "[99Tc+5]": 584,
762
+ "[He]": 585,
763
+ "[12c]": 586,
764
+ "[Kr]": 587,
765
+ "[RuH+2]": 588,
766
+ "[35Cl]": 589,
767
+ "[Pd-2]": 590,
768
+ "[GaH2]": 591,
769
+ "[4H]": 592,
770
+ "[Sg]": 593,
771
+ "[Cu-2]": 594,
772
+ "[Br+3]": 595,
773
+ "%83": 596,
774
+ "[37Cl]": 597,
775
+ "[211At]": 598,
776
+ "[IrH+2]": 599,
777
+ "[Mt]": 600,
778
+ "[Ir-2]": 601,
779
+ "[In-]": 602,
780
+ "[12cH]": 603,
781
+ "[12CH2]": 604,
782
+ "[RuH2]": 605,
783
+ "[99Tc+7]": 606,
784
+ "%84": 607,
785
+ "[15n+]": 608,
786
+ "[ClH2+2]": 609,
787
+ "[16N]": 610,
788
+ "[111In]": 611,
789
+ "[Tc+]": 612,
790
+ "[Ru-2]": 613,
791
+ "[12CH]": 614,
792
+ "[si+]": 615,
793
+ "[Tc+6]": 616,
794
+ "%85": 617,
795
+ "%86": 618,
796
+ "[90Y]": 619,
797
+ "[Pd-]": 620,
798
+ "[188Re]": 621,
799
+ "[RuH+]": 622,
800
+ "[NiH]": 623,
801
+ "[SiH3-]": 624,
802
+ "[14n]": 625,
803
+ "[CH3]": 626,
804
+ "[14N]": 627,
805
+ "[10BH2]": 628,
806
+ "%88": 629,
807
+ "%89": 630,
808
+ "%90": 631,
809
+ "[34S]": 632,
810
+ "[77Br]": 633,
811
+ "[GaH]": 634,
812
+ "[Br]": 635,
813
+ "[Ge@]": 636,
814
+ "[B@@H-]": 637,
815
+ "[CuH]": 638,
816
+ "[SiH4]": 639,
817
+ "[3H-]": 640,
818
+ "%87": 641,
819
+ "%91": 642,
820
+ "%92": 643,
821
+ "[67Cu]": 644,
822
+ "[I]": 645,
823
+ "[177Lu]": 646,
824
+ "[ReH]": 647,
825
+ "[67Ga+3]": 648,
826
+ "[Db]": 649,
827
+ "[177Lu+3]": 650,
828
+ "[AlH2-]": 651,
829
+ "[Si+3]": 652,
830
+ "[Ti-2]": 653,
831
+ "[RuH+3]": 654,
832
+ "[al+]": 655,
833
+ "[68Ga]": 656,
834
+ "[2H+]": 657,
835
+ "[B@H-]": 658,
836
+ "[WH2]": 659,
837
+ "[OsH]": 660,
838
+ "[Ir-3]": 661,
839
+ "[AlH-]": 662,
840
+ "[Bk]": 663,
841
+ "[75Se]": 664,
842
+ "[14C@]": 665,
843
+ "[Pt-]": 666,
844
+ "[N@@H+]": 667,
845
+ "[Nb-]": 668,
846
+ "[13NH2]": 669,
847
+ "%93": 670,
848
+ "[186Re]": 671,
849
+ "[Tb+4]": 672,
850
+ "[PtH]": 673,
851
+ "[IrH2]": 674,
852
+ "[Hg-2]": 675,
853
+ "[AlH3-]": 676,
854
+ "[PdH+]": 677,
855
+ "[Md]": 678,
856
+ "[RhH+2]": 679,
857
+ "[11cH]": 680,
858
+ "[Co-2]": 681,
859
+ "[15N-]": 682,
860
+ "[ZrH2]": 683,
861
+ "%94": 684,
862
+ "[Hg-]": 685,
863
+ "[127I]": 686,
864
+ "[AsH2+]": 687,
865
+ "[MoH2]": 688,
866
+ "[Te+4]": 689,
867
+ "[14C@@]": 690,
868
+ "[As+5]": 691,
869
+ "[SnH+3]": 692,
870
+ "[Ge@@]": 693,
871
+ "[6Li+]": 694,
872
+ "[WH]": 695,
873
+ "[Ne]": 696,
874
+ "[14NH2]": 697,
875
+ "[14NH]": 698,
876
+ "[12C@@H]": 699,
877
+ "[Os+7]": 700,
878
+ "[RhH]": 701,
879
+ "[Al-3]": 702,
880
+ "[SnH+]": 703,
881
+ "[15NH3+]": 704,
882
+ "[Zr+]": 705,
883
+ "[197Hg+]": 706,
884
+ "%95": 707,
885
+ "%96": 708,
886
+ "[90Y+3]": 709,
887
+ "[Os-2]": 710,
888
+ "[98Tc+5]": 711,
889
+ "[15NH3]": 712,
890
+ "[bH-]": 713,
891
+ "[33P]": 714,
892
+ "[Zr-2]": 715,
893
+ "[15O]": 716,
894
+ "[Rh-]": 717,
895
+ "[PbH3]": 718,
896
+ "[PH2]": 719,
897
+ "[Ni-]": 720,
898
+ "[CuH+]": 721,
899
+ "%97": 722,
900
+ "%98": 723,
901
+ "%99": 724,
902
+ "[Os+5]": 725,
903
+ "[PtH+]": 726,
904
+ "[ReH4]": 727,
905
+ "[16NH]": 728,
906
+ "[82Br]": 729,
907
+ "[W-]": 730,
908
+ "[18F-]": 731,
909
+ "[15NH4+]": 732,
910
+ "[Se+4]": 733,
911
+ "[SeH-]": 734,
912
+ "[67Cu+2]": 735,
913
+ "[12C@H]": 736,
914
+ "[AsH3]": 737,
915
+ "[HgH]": 738,
916
+ "[10B-]": 739,
917
+ "[99Tc+6]": 740,
918
+ "[117Sn+4]": 741,
919
+ "[Te@]": 742,
920
+ "[P@+]": 743,
921
+ "[35SH]": 744,
922
+ "[SeH+]": 745,
923
+ "[Ni-2]": 746,
924
+ "[Al-2]": 747,
925
+ "[TeH2]": 748,
926
+ "[Bh]": 749,
927
+ "[99Tc+2]": 750,
928
+ "[Os+8]": 751,
929
+ "[PH-2]": 752,
930
+ "[7Li+]": 753,
931
+ "[14nH]": 754,
932
+ "[AlH+2]": 755,
933
+ "[18FH]": 756,
934
+ "[SnH4]": 757,
935
+ "[18O-2]": 758,
936
+ "[IrH]": 759,
937
+ "[13N]": 760,
938
+ "[Te@@]": 761,
939
+ "[Rh-3]": 762,
940
+ "[15NH+]": 763,
941
+ "[AsH3+]": 764,
942
+ "[SeH2]": 765,
943
+ "[AsH+]": 766,
944
+ "[CoH2]": 767,
945
+ "[16NH2]": 768,
946
+ "[AsH-]": 769,
947
+ "[203Hg+]": 770,
948
+ "[P@@+]": 771,
949
+ "[166Ho+3]": 772,
950
+ "[60Co+3]": 773,
951
+ "[13CH2-]": 774,
952
+ "[SeH2+]": 775,
953
+ "[75Br]": 776,
954
+ "[TlH2]": 777,
955
+ "[80Br]": 778,
956
+ "[siH+]": 779,
957
+ "[Ca+]": 780,
958
+ "[153Sm+3]": 781,
959
+ "[PdH]": 782,
960
+ "[225Ac]": 783,
961
+ "[13CH3-]": 784,
962
+ "[AlH4-]": 785,
963
+ "[FeH]": 786,
964
+ "[13CH-]": 787,
965
+ "[14C-]": 788,
966
+ "[11C-]": 789,
967
+ "[153Sm]": 790,
968
+ "[Re-]": 791,
969
+ "[te+]": 792,
970
+ "[13CH4]": 793,
971
+ "[ClH+2]": 794,
972
+ "[8CH2]": 795,
973
+ "[99Mo]": 796,
974
+ "[ClH3+3]": 797,
975
+ "[SbH3]": 798,
976
+ "[25Mg+2]": 799,
977
+ "[16N+]": 800,
978
+ "[SnH2+]": 801,
979
+ "[11C@H]": 802,
980
+ "[122I]": 803,
981
+ "[Re-2]": 804,
982
+ "[RuH2+2]": 805,
983
+ "[ZrH]": 806,
984
+ "[Bi-]": 807,
985
+ "[Pr+]": 808,
986
+ "[Rn]": 809,
987
+ "[Fr]": 810,
988
+ "[36Cl]": 811,
989
+ "[18o]": 812,
990
+ "[YH]": 813,
991
+ "[79Br]": 814,
992
+ "[121I]": 815,
993
+ "[113In+3]": 816,
994
+ "[TaH]": 817,
995
+ "[RhH2]": 818,
996
+ "[Ta-]": 819,
997
+ "[67Ga]": 820,
998
+ "[ZnH+]": 821,
999
+ "[SnH2-]": 822,
1000
+ "[OsH2]": 823,
1001
+ "[16F]": 824,
1002
+ "[FeH2]": 825,
1003
+ "[14O]": 826,
1004
+ "[PbH2+2]": 827,
1005
+ "[BH2]": 828,
1006
+ "[6H]": 829,
1007
+ "[125Te]": 830,
1008
+ "[197Hg]": 831,
1009
+ "[TaH2]": 832,
1010
+ "[TaH3]": 833,
1011
+ "[76As]": 834,
1012
+ "[Nb-2]": 835,
1013
+ "[14N+]": 836,
1014
+ "[125I-]": 837,
1015
+ "[33S]": 838,
1016
+ "[IH2+2]": 839,
1017
+ "[NH2]": 840,
1018
+ "[PtH2]": 841,
1019
+ "[MnH]": 842,
1020
+ "[19C]": 843,
1021
+ "[17F]": 844,
1022
+ "[1H-]": 845,
1023
+ "[SnH4+2]": 846,
1024
+ "[Mn-2]": 847,
1025
+ "[15NH2+]": 848,
1026
+ "[TiH2]": 849,
1027
+ "[ReH7]": 850,
1028
+ "[Cd-2]": 851,
1029
+ "[Fe-3]": 852,
1030
+ "[SH2]": 853,
1031
+ "[17O-]": 854,
1032
+ "[siH-]": 855,
1033
+ "[CoH+]": 856,
1034
+ "[VH]": 857,
1035
+ "[10BH]": 858,
1036
+ "[Ru-3]": 859,
1037
+ "[13O]": 860,
1038
+ "[5H]": 861,
1039
+ "[15n-]": 862,
1040
+ "[153Gd]": 863,
1041
+ "[12C@]": 864,
1042
+ "[11CH3-]": 865,
1043
+ "[IrH3]": 866,
1044
+ "[RuH3]": 867,
1045
+ "[74Se]": 868,
1046
+ "[Se@]": 869,
1047
+ "[Hf+]": 870,
1048
+ "[77Se]": 871,
1049
+ "[166Ho]": 872,
1050
+ "[59Fe+2]": 873,
1051
+ "[203Hg]": 874,
1052
+ "[18OH-]": 875,
1053
+ "[8CH]": 876,
1054
+ "[12C@@]": 877,
1055
+ "[11CH4]": 878,
1056
+ "[15C]": 879,
1057
+ "[249Cf]": 880,
1058
+ "[PbH4]": 881,
1059
+ "[64Zn]": 882,
1060
+ "[99Tc+]": 883,
1061
+ "[14c-]": 884,
1062
+ "[149Pm]": 885,
1063
+ "[IrH4]": 886,
1064
+ "[Se@@]": 887,
1065
+ "[13OH]": 888,
1066
+ "[14CH3-]": 889,
1067
+ "[28Si]": 890,
1068
+ "[Rh-2]": 891,
1069
+ "[Fe-2]": 892,
1070
+ "[131I-]": 893,
1071
+ "[51Cr]": 894,
1072
+ "[62Cu+2]": 895,
1073
+ "[81Br]": 896,
1074
+ "[121Sb]": 897,
1075
+ "[7Li]": 898,
1076
+ "[89Zr+4]": 899,
1077
+ "[SbH3+]": 900,
1078
+ "[11C@@H]": 901,
1079
+ "[98Tc]": 902,
1080
+ "[59Fe+3]": 903,
1081
+ "[BiH2+]": 904,
1082
+ "[SbH+]": 905,
1083
+ "[TiH]": 906,
1084
+ "[14NH3]": 907,
1085
+ "[15OH]": 908,
1086
+ "[119Sn]": 909,
1087
+ "[201Hg]": 910,
1088
+ "[MnH+]": 911,
1089
+ "[201Tl]": 912,
1090
+ "[51Cr+3]": 913,
1091
+ "[123I-]": 914,
1092
+ "[MoH]": 915,
1093
+ "[AlH6-3]": 916,
1094
+ "[MnH2]": 917,
1095
+ "[WH3]": 918,
1096
+ "[213Bi+3]": 919,
1097
+ "[SnH2+2]": 920,
1098
+ "[123IH]": 921,
1099
+ "[13CH+]": 922,
1100
+ "[Zr-]": 923,
1101
+ "[74As]": 924,
1102
+ "[13C+]": 925,
1103
+ "[32P+]": 926,
1104
+ "[KrH]": 927,
1105
+ "[SiH+2]": 928,
1106
+ "[ClH3+2]": 929,
1107
+ "[13NH]": 930,
1108
+ "[9CH2]": 931,
1109
+ "[ZrH2+2]": 932,
1110
+ "[87Sr+2]": 933,
1111
+ "[35s]": 934,
1112
+ "[239Pu]": 935,
1113
+ "[198Au]": 936,
1114
+ "[241Am]": 937,
1115
+ "[203Hg+2]": 938,
1116
+ "[V+]": 939,
1117
+ "[YH2]": 940,
1118
+ "[195Pt]": 941,
1119
+ "[203Pb]": 942,
1120
+ "[RuH4]": 943,
1121
+ "[ThH2]": 944,
1122
+ "[AuH]": 945,
1123
+ "[66Ga+3]": 946,
1124
+ "[11B-]": 947,
1125
+ "[F]": 948,
1126
+ "[24Na+]": 949,
1127
+ "[85Sr+2]": 950,
1128
+ "[201Tl+]": 951,
1129
+ "[14CH4]": 952,
1130
+ "[32S]": 953,
1131
+ "[TeH2+]": 954,
1132
+ "[ClH2+3]": 955,
1133
+ "[AgH]": 956,
1134
+ "[Ge@H]": 957,
1135
+ "[44Ca+2]": 958,
1136
+ "[Os-]": 959,
1137
+ "[31P]": 960,
1138
+ "[15nH+]": 961,
1139
+ "[SbH4]": 962,
1140
+ "[TiH+]": 963,
1141
+ "[Ba+]": 964,
1142
+ "[57Co+2]": 965,
1143
+ "[Ta+]": 966,
1144
+ "[125IH]": 967,
1145
+ "[77As]": 968,
1146
+ "[129I]": 969,
1147
+ "[Fe-4]": 970,
1148
+ "[Ta-2]": 971,
1149
+ "[19O]": 972,
1150
+ "[12O]": 973,
1151
+ "[BiH3]": 974,
1152
+ "[237Np]": 975,
1153
+ "[252Cf]": 976,
1154
+ "[86Y]": 977,
1155
+ "[Cr-2]": 978,
1156
+ "[89Y]": 979,
1157
+ "[195Pt+2]": 980,
1158
+ "[si+2]": 981,
1159
+ "[58Fe+2]": 982,
1160
+ "[Hs]": 983,
1161
+ "[S@@H]": 984,
1162
+ "[8CH4]": 985,
1163
+ "[164Dy+3]": 986,
1164
+ "[47Ca+2]": 987,
1165
+ "[57Co]": 988,
1166
+ "[NbH2]": 989,
1167
+ "[ReH2]": 990,
1168
+ "[ZnH2]": 991,
1169
+ "[CrH2]": 992,
1170
+ "[17NH]": 993,
1171
+ "[ZrH3]": 994,
1172
+ "[RhH3]": 995,
1173
+ "[12C-]": 996,
1174
+ "[18O+]": 997,
1175
+ "[Bi-2]": 998,
1176
+ "[ClH4+3]": 999,
1177
+ "[Ni-3]": 1000,
1178
+ "[Ag-]": 1001,
1179
+ "[111In-]": 1002,
1180
+ "[Mo-2]": 1003,
1181
+ "[55Fe+3]": 1004,
1182
+ "[204Hg+]": 1005,
1183
+ "[35Cl-]": 1006,
1184
+ "[211Pb]": 1007,
1185
+ "[75Ge]": 1008,
1186
+ "[8B]": 1009,
1187
+ "[TeH3]": 1010,
1188
+ "[SnH3+]": 1011,
1189
+ "[Zr-3]": 1012,
1190
+ "[28F]": 1013,
1191
+ "[249Bk]": 1014,
1192
+ "[169Yb]": 1015,
1193
+ "[34SH]": 1016,
1194
+ "[6Li]": 1017,
1195
+ "[94Tc]": 1018,
1196
+ "[197Au]": 1019,
1197
+ "[195Pt+4]": 1020,
1198
+ "[169Yb+3]": 1021,
1199
+ "[32Cl]": 1022,
1200
+ "[82Se]": 1023,
1201
+ "[159Gd+3]": 1024,
1202
+ "[213Bi]": 1025,
1203
+ "[CoH+2]": 1026,
1204
+ "[36S]": 1027,
1205
+ "[35P]": 1028,
1206
+ "[Ru-4]": 1029,
1207
+ "[Cr-3]": 1030,
1208
+ "[60Co]": 1031,
1209
+ "[1H+]": 1032,
1210
+ "[18CH2]": 1033,
1211
+ "[Cd-]": 1034,
1212
+ "[152Sm+3]": 1035,
1213
+ "[106Ru]": 1036,
1214
+ "[238Pu]": 1037,
1215
+ "[220Rn]": 1038,
1216
+ "[45Ca+2]": 1039,
1217
+ "[89Sr+2]": 1040,
1218
+ "[239Np]": 1041,
1219
+ "[90Sr+2]": 1042,
1220
+ "[137Cs+]": 1043,
1221
+ "[165Dy]": 1044,
1222
+ "[68GaH3]": 1045,
1223
+ "[65Zn+2]": 1046,
1224
+ "[89Zr]": 1047,
1225
+ "[BiH2+2]": 1048,
1226
+ "[62Cu]": 1049,
1227
+ "[165Dy+3]": 1050,
1228
+ "[238U]": 1051,
1229
+ "[105Rh+3]": 1052,
1230
+ "[70Zn]": 1053,
1231
+ "[12B]": 1054,
1232
+ "[12OH]": 1055,
1233
+ "[18CH]": 1056,
1234
+ "[17CH]": 1057,
1235
+ "[42K]": 1058,
1236
+ "[76Br-]": 1059,
1237
+ "[71As]": 1060,
1238
+ "[NbH3]": 1061,
1239
+ "[ReH3]": 1062,
1240
+ "[OsH-]": 1063,
1241
+ "[WH4]": 1064,
1242
+ "[MoH3]": 1065,
1243
+ "[OsH4]": 1066,
1244
+ "[RuH6]": 1067,
1245
+ "[PtH3]": 1068,
1246
+ "[CuH2]": 1069,
1247
+ "[CoH3]": 1070,
1248
+ "[TiH4]": 1071,
1249
+ "[64Zn+2]": 1072,
1250
+ "[Si-2]": 1073,
1251
+ "[79BrH]": 1074,
1252
+ "[14CH2-]": 1075,
1253
+ "[PtH2+2]": 1076,
1254
+ "[Os-3]": 1077,
1255
+ "[29Si]": 1078,
1256
+ "[Ti-]": 1079,
1257
+ "[Se+6]": 1080,
1258
+ "[22Na+]": 1081,
1259
+ "[42K+]": 1082,
1260
+ "[131Cs+]": 1083,
1261
+ "[86Rb+]": 1084,
1262
+ "[134Cs+]": 1085,
1263
+ "[209Po]": 1086,
1264
+ "[208Po]": 1087,
1265
+ "[81Rb+]": 1088,
1266
+ "[203Tl+]": 1089,
1267
+ "[Zr-4]": 1090,
1268
+ "[148Sm]": 1091,
1269
+ "[147Sm]": 1092,
1270
+ "[37Cl-]": 1093,
1271
+ "[12CH4]": 1094,
1272
+ "[Ge@@H]": 1095,
1273
+ "[63Cu]": 1096,
1274
+ "[13CH2+]": 1097,
1275
+ "[AsH2-]": 1098,
1276
+ "[CeH]": 1099,
1277
+ "[SnH-]": 1100,
1278
+ "[UH]": 1101,
1279
+ "[9c]": 1102,
1280
+ "[21CH3]": 1103,
1281
+ "[TeH+]": 1104,
1282
+ "[57Co+3]": 1105,
1283
+ "[8BH2]": 1106,
1284
+ "[12BH2]": 1107,
1285
+ "[19BH2]": 1108,
1286
+ "[9BH2]": 1109,
1287
+ "[YbH2]": 1110,
1288
+ "[CrH+2]": 1111,
1289
+ "[208Bi]": 1112,
1290
+ "[152Gd]": 1113,
1291
+ "[61Cu]": 1114,
1292
+ "[115In]": 1115,
1293
+ "[60Co+2]": 1116,
1294
+ "[13NH2-]": 1117,
1295
+ "[120I]": 1118,
1296
+ "[18OH2]": 1119,
1297
+ "[75SeH]": 1120,
1298
+ "[SbH2+]": 1121,
1299
+ "[144Ce]": 1122,
1300
+ "[16n]": 1123,
1301
+ "[113In]": 1124,
1302
+ "[22nH]": 1125,
1303
+ "[129I-]": 1126,
1304
+ "[InH3]": 1127,
1305
+ "[32PH3]": 1128,
1306
+ "[234U]": 1129,
1307
+ "[235U]": 1130,
1308
+ "[59Fe]": 1131,
1309
+ "[82Rb+]": 1132,
1310
+ "[65Zn]": 1133,
1311
+ "[244Cm]": 1134,
1312
+ "[147Pm]": 1135,
1313
+ "[91Y]": 1136,
1314
+ "[237Pu]": 1137,
1315
+ "[231Pa]": 1138,
1316
+ "[253Cf]": 1139,
1317
+ "[127Te]": 1140,
1318
+ "[187Re]": 1141,
1319
+ "[236Np]": 1142,
1320
+ "[235Np]": 1143,
1321
+ "[72Zn]": 1144,
1322
+ "[253Es]": 1145,
1323
+ "[159Dy]": 1146,
1324
+ "[62Zn]": 1147,
1325
+ "[101Tc]": 1148,
1326
+ "[149Tb]": 1149,
1327
+ "[124I-]": 1150,
1328
+ "[SeH3+]": 1151,
1329
+ "[210Pb]": 1152,
1330
+ "[40K]": 1153,
1331
+ "[210Po]": 1154,
1332
+ "[214Pb]": 1155,
1333
+ "[218Po]": 1156,
1334
+ "[214Po]": 1157,
1335
+ "[7Be]": 1158,
1336
+ "[212Pb]": 1159,
1337
+ "[205Pb]": 1160,
1338
+ "[209Pb]": 1161,
1339
+ "[123Te]": 1162,
1340
+ "[202Pb]": 1163,
1341
+ "[72As]": 1164,
1342
+ "[201Pb]": 1165,
1343
+ "[70As]": 1166,
1344
+ "[73Ge]": 1167,
1345
+ "[200Pb]": 1168,
1346
+ "[198Pb]": 1169,
1347
+ "[66Ga]": 1170,
1348
+ "[73Se]": 1171,
1349
+ "[195Pb]": 1172,
1350
+ "[199Pb]": 1173,
1351
+ "[144Ce+3]": 1174,
1352
+ "[235U+2]": 1175,
1353
+ "[90Tc]": 1176,
1354
+ "[114In+3]": 1177,
1355
+ "[128I]": 1178,
1356
+ "[100Tc+]": 1179,
1357
+ "[82Br-]": 1180,
1358
+ "[191Pt+2]": 1181,
1359
+ "[191Pt+4]": 1182,
1360
+ "[193Pt+4]": 1183,
1361
+ "[31PH3]": 1184,
1362
+ "[125I+2]": 1185,
1363
+ "[131I+2]": 1186,
1364
+ "[125Te+4]": 1187,
1365
+ "[82Sr+2]": 1188,
1366
+ "[149Sm]": 1189,
1367
+ "[81BrH]": 1190,
1368
+ "[129Xe]": 1191,
1369
+ "[193Pt+2]": 1192,
1370
+ "[123I+2]": 1193,
1371
+ "[Cr-]": 1194,
1372
+ "[Co-]": 1195,
1373
+ "[227Th+4]": 1196,
1374
+ "[249Cf+3]": 1197,
1375
+ "[252Cf+3]": 1198,
1376
+ "[187Os]": 1199,
1377
+ "[16O-]": 1200,
1378
+ "[17O+]": 1201,
1379
+ "[16OH-]": 1202,
1380
+ "[98Tc+7]": 1203,
1381
+ "[58Co+2]": 1204,
1382
+ "[69Ga+3]": 1205,
1383
+ "[57Fe+2]": 1206,
1384
+ "[43K+]": 1207,
1385
+ "[16C]": 1208,
1386
+ "[52Fe+3]": 1209,
1387
+ "[SeH5]": 1210,
1388
+ "[194Pb]": 1211,
1389
+ "[196Pb]": 1212,
1390
+ "[197Pb]": 1213,
1391
+ "[213Pb]": 1214,
1392
+ "[9B]": 1215,
1393
+ "[19B]": 1216,
1394
+ "[11CH-]": 1217,
1395
+ "[9CH]": 1218,
1396
+ "[20OH]": 1219,
1397
+ "[25OH]": 1220,
1398
+ "[8cH]": 1221,
1399
+ "[TiH+3]": 1222,
1400
+ "[SnH6+3]": 1223,
1401
+ "[N@H+]": 1224,
1402
+ "[52Mn+2]": 1225,
1403
+ "[64Ga]": 1226,
1404
+ "[13B]": 1227,
1405
+ "[216Bi]": 1228,
1406
+ "[117Sn+2]": 1229,
1407
+ "[232Th]": 1230,
1408
+ "[SnH+2]": 1231,
1409
+ "[BiH5]": 1232,
1410
+ "[77Kr]": 1233,
1411
+ "[103Cd]": 1234,
1412
+ "[62Ni]": 1235,
1413
+ "[LaH3]": 1236,
1414
+ "[SmH3]": 1237,
1415
+ "[EuH3]": 1238,
1416
+ "[MoH5]": 1239,
1417
+ "[64Ni]": 1240,
1418
+ "[66Zn]": 1241,
1419
+ "[68Zn]": 1242,
1420
+ "[186W]": 1243,
1421
+ "[FeH4]": 1244,
1422
+ "[MoH4]": 1245,
1423
+ "[HgH2]": 1246,
1424
+ "[15NH2-]": 1247,
1425
+ "[UH2]": 1248,
1426
+ "[204Hg]": 1249,
1427
+ "[GaH4-]": 1250,
1428
+ "[ThH4]": 1251,
1429
+ "[WH6]": 1252,
1430
+ "[PtH4]": 1253,
1431
+ "[VH2]": 1254,
1432
+ "[UH3]": 1255,
1433
+ "[FeH3]": 1256,
1434
+ "[RuH5]": 1257,
1435
+ "[BiH4]": 1258,
1436
+ "[80Br-]": 1259,
1437
+ "[CeH3]": 1260,
1438
+ "[37ClH]": 1261,
1439
+ "[157Gd+3]": 1262,
1440
+ "[205Tl]": 1263,
1441
+ "[203Tl]": 1264,
1442
+ "[62Cu+]": 1265,
1443
+ "[64Cu+]": 1266,
1444
+ "[61Cu+]": 1267,
1445
+ "[37SH2]": 1268,
1446
+ "[30Si]": 1269,
1447
+ "[28Al]": 1270,
1448
+ "[19OH2]": 1271,
1449
+ "[8He]": 1272,
1450
+ "[6He]": 1273,
1451
+ "[153Pm]": 1274,
1452
+ "[209Bi]": 1275,
1453
+ "[66Zn+2]": 1276,
1454
+ "[10CH4]": 1277,
1455
+ "[191Ir]": 1278,
1456
+ "[66Cu]": 1279,
1457
+ "[16O+]": 1280,
1458
+ "[25O]": 1281,
1459
+ "[10c]": 1282,
1460
+ "[Co-3]": 1283,
1461
+ "[Sn@@]": 1284,
1462
+ "[17OH-]": 1285,
1463
+ "[206Po]": 1286,
1464
+ "[204Po]": 1287,
1465
+ "[202Po]": 1288,
1466
+ "[201Po]": 1289,
1467
+ "[200Po]": 1290,
1468
+ "[199Po]": 1291,
1469
+ "[198Po]": 1292,
1470
+ "[197Po]": 1293,
1471
+ "[196Po]": 1294,
1472
+ "[195Po]": 1295,
1473
+ "[194Po]": 1296,
1474
+ "[193Po]": 1297,
1475
+ "[192Po]": 1298,
1476
+ "[191Po]": 1299,
1477
+ "[190Po]": 1300,
1478
+ "[217Po]": 1301,
1479
+ "[BiH4-]": 1302,
1480
+ "[TeH4]": 1303,
1481
+ "[222Ra]": 1304,
1482
+ "[62Ga]": 1305,
1483
+ "[39Ar]": 1306,
1484
+ "[144Sm]": 1307,
1485
+ "[58Fe]": 1308,
1486
+ "[153Eu]": 1309,
1487
+ "[85Rb]": 1310,
1488
+ "[171Yb]": 1311,
1489
+ "[172Yb]": 1312,
1490
+ "[114Cd]": 1313,
1491
+ "[51Fe]": 1314,
1492
+ "[142Ce]": 1315,
1493
+ "[207Tl]": 1316,
1494
+ "[92Mo]": 1317,
1495
+ "[115Sn]": 1318,
1496
+ "[140Ce]": 1319,
1497
+ "[202Hg]": 1320,
1498
+ "[180W]": 1321,
1499
+ "[182W]": 1322,
1500
+ "[183W]": 1323,
1501
+ "[184W]": 1324,
1502
+ "[96Mo]": 1325,
1503
+ "[47Ti]": 1326,
1504
+ "[111Cd]": 1327,
1505
+ "[143Nd]": 1328,
1506
+ "[145Nd]": 1329,
1507
+ "[126Te]": 1330,
1508
+ "[128Te]": 1331,
1509
+ "[130Te]": 1332,
1510
+ "[185Re]": 1333,
1511
+ "[97Mo]": 1334,
1512
+ "[98Mo]": 1335,
1513
+ "[183Re]": 1336,
1514
+ "[52V]": 1337,
1515
+ "[80Se]": 1338,
1516
+ "[87Kr]": 1339,
1517
+ "[137Xe]": 1340,
1518
+ "[196Au]": 1341,
1519
+ "[146Ce]": 1342,
1520
+ "[88Kr]": 1343,
1521
+ "[51Ti]": 1344,
1522
+ "[138Xe]": 1345,
1523
+ "[112Cd]": 1346,
1524
+ "[116Sn]": 1347,
1525
+ "[120Sn]": 1348,
1526
+ "[28SiH3]": 1349,
1527
+ "[35S-]": 1350,
1528
+ "[15NH-]": 1351,
1529
+ "[13CH3+]": 1352,
1530
+ "[34S+]": 1353,
1531
+ "[34s]": 1354,
1532
+ "[SiH4-]": 1355,
1533
+ "[100Tc+5]": 1356,
1534
+ "[NiH2+2]": 1357,
1535
+ "[239Th]": 1358,
1536
+ "[186Lu]": 1359,
1537
+ "[AuH3]": 1360,
1538
+ "[I@@-]": 1361,
1539
+ "[XeH2]": 1362,
1540
+ "[B+]": 1363,
1541
+ "[16CH2]": 1364,
1542
+ "[8C]": 1365,
1543
+ "[TaH5]": 1366,
1544
+ "[FeH4-]": 1367,
1545
+ "[19C@H]": 1368,
1546
+ "[10NH]": 1369,
1547
+ "[FeH6-3]": 1370,
1548
+ "[22CH]": 1371,
1549
+ "[25N]": 1372,
1550
+ "[25N+]": 1373,
1551
+ "[25N-]": 1374,
1552
+ "[21CH2]": 1375,
1553
+ "[18cH]": 1376,
1554
+ "[113I]": 1377,
1555
+ "[ScH3]": 1378,
1556
+ "[30PH3]": 1379,
1557
+ "[43Ca+2]": 1380,
1558
+ "[41Ca+2]": 1381,
1559
+ "[106Cd]": 1382,
1560
+ "[122Sn]": 1383,
1561
+ "[18CH3]": 1384,
1562
+ "[58Co+3]": 1385,
1563
+ "[98Tc+4]": 1386,
1564
+ "[70Ge]": 1387,
1565
+ "[76Ge]": 1388,
1566
+ "[108Cd]": 1389,
1567
+ "[116Cd]": 1390,
1568
+ "[130Xe]": 1391,
1569
+ "[94Mo]": 1392,
1570
+ "[124Sn]": 1393,
1571
+ "[186Os]": 1394,
1572
+ "[188Os]": 1395,
1573
+ "[190Os]": 1396,
1574
+ "[192Os]": 1397,
1575
+ "[106Pd]": 1398,
1576
+ "[110Pd]": 1399,
1577
+ "[120Te]": 1400,
1578
+ "[132Ba]": 1401,
1579
+ "[134Ba]": 1402,
1580
+ "[136Ba]": 1403,
1581
+ "[136Ce]": 1404,
1582
+ "[138Ce]": 1405,
1583
+ "[156Dy]": 1406,
1584
+ "[158Dy]": 1407,
1585
+ "[160Dy]": 1408,
1586
+ "[163Dy]": 1409,
1587
+ "[162Er]": 1410,
1588
+ "[164Er]": 1411,
1589
+ "[167Er]": 1412,
1590
+ "[176Hf]": 1413,
1591
+ "[26Mg]": 1414,
1592
+ "[144Nd]": 1415,
1593
+ "[150Nd]": 1416,
1594
+ "[41K]": 1417,
1595
+ "[46Ti]": 1418,
1596
+ "[48Ti]": 1419,
1597
+ "[49Ti]": 1420,
1598
+ "[50Ti]": 1421,
1599
+ "[170Yb]": 1422,
1600
+ "[173Yb]": 1423,
1601
+ "[91Zr]": 1424,
1602
+ "[92Zr]": 1425,
1603
+ "[96Zr]": 1426,
1604
+ "[34S-]": 1427,
1605
+ "[CuH2-]": 1428,
1606
+ "[38Cl]": 1429,
1607
+ "[25Mg]": 1430,
1608
+ "[51V]": 1431,
1609
+ "[93Nb]": 1432,
1610
+ "[95Mo]": 1433,
1611
+ "[45Sc]": 1434,
1612
+ "[123Sb]": 1435,
1613
+ "[139La]": 1436,
1614
+ "[9Be]": 1437,
1615
+ "[99Y+3]": 1438,
1616
+ "[99Y]": 1439,
1617
+ "[156Ho]": 1440,
1618
+ "[67Zn]": 1441,
1619
+ "[144Ce+4]": 1442,
1620
+ "[210Tl]": 1443,
1621
+ "[42Ca]": 1444,
1622
+ "[54Fe]": 1445,
1623
+ "[193Ir]": 1446,
1624
+ "[92Nb]": 1447,
1625
+ "[141Cs]": 1448,
1626
+ "[52Cr]": 1449,
1627
+ "[35ClH]": 1450,
1628
+ "[46Ca]": 1451,
1629
+ "[139Cs]": 1452,
1630
+ "[65Cu]": 1453,
1631
+ "[71Ga]": 1454,
1632
+ "[60Ni]": 1455,
1633
+ "[16NH3]": 1456,
1634
+ "[148Nd]": 1457,
1635
+ "[72Ge]": 1458,
1636
+ "[161Dy]": 1459,
1637
+ "[49Ca]": 1460,
1638
+ "[43Ca]": 1461,
1639
+ "[8Be]": 1462,
1640
+ "[48Ca]": 1463,
1641
+ "[44Ca]": 1464,
1642
+ "[120Xe]": 1465,
1643
+ "[80Rb]": 1466,
1644
+ "[215At]": 1467,
1645
+ "[180Re]": 1468,
1646
+ "[146Sm]": 1469,
1647
+ "[19Ne]": 1470,
1648
+ "[74Kr]": 1471,
1649
+ "[134La]": 1472,
1650
+ "[76Kr]": 1473,
1651
+ "[219Fr]": 1474,
1652
+ "[121Xe]": 1475,
1653
+ "[220Fr]": 1476,
1654
+ "[216At]": 1477,
1655
+ "[223Ac]": 1478,
1656
+ "[218At]": 1479,
1657
+ "[37Ar]": 1480,
1658
+ "[135I]": 1481,
1659
+ "[110Cd]": 1482,
1660
+ "[94Tc+7]": 1483,
1661
+ "[86Y+3]": 1484,
1662
+ "[135I-]": 1485,
1663
+ "[15O-2]": 1486,
1664
+ "[151Eu+3]": 1487,
1665
+ "[161Tb+3]": 1488,
1666
+ "[197Hg+2]": 1489,
1667
+ "[109Cd+2]": 1490,
1668
+ "[191Os+4]": 1491,
1669
+ "[170Tm+3]": 1492,
1670
+ "[205Bi+3]": 1493,
1671
+ "[233U+4]": 1494,
1672
+ "[126Sb+3]": 1495,
1673
+ "[127Sb+3]": 1496,
1674
+ "[132Cs+]": 1497,
1675
+ "[136Eu+3]": 1498,
1676
+ "[136Eu]": 1499,
1677
+ "[125Sn+4]": 1500,
1678
+ "[175Yb+3]": 1501,
1679
+ "[100Mo]": 1502,
1680
+ "[22Ne]": 1503,
1681
+ "[13c-]": 1504,
1682
+ "[13NH4+]": 1505,
1683
+ "[17C]": 1506,
1684
+ "[9C]": 1507,
1685
+ "[31S]": 1508,
1686
+ "[31SH]": 1509,
1687
+ "[133I]": 1510,
1688
+ "[126I]": 1511,
1689
+ "[36SH]": 1512,
1690
+ "[30S]": 1513,
1691
+ "[32SH]": 1514,
1692
+ "[19CH2]": 1515,
1693
+ "[19c]": 1516,
1694
+ "[18c]": 1517,
1695
+ "[15F]": 1518,
1696
+ "[10C]": 1519,
1697
+ "[RuH-]": 1520,
1698
+ "[62Zn+2]": 1521,
1699
+ "[32ClH]": 1522,
1700
+ "[33ClH]": 1523,
1701
+ "[78BrH]": 1524,
1702
+ "[12Li+]": 1525,
1703
+ "[12Li]": 1526,
1704
+ "[233Ra]": 1527,
1705
+ "[68Ge+4]": 1528,
1706
+ "[44Sc+3]": 1529,
1707
+ "[91Y+3]": 1530,
1708
+ "[106Ru+3]": 1531,
1709
+ "[PoH2]": 1532,
1710
+ "[AtH]": 1533,
1711
+ "[55Fe]": 1534,
1712
+ "[233U]": 1535,
1713
+ "[210PoH2]": 1536,
1714
+ "[230Th]": 1537,
1715
+ "[228Th]": 1538,
1716
+ "[222Rn]": 1539,
1717
+ "[35SH2]": 1540,
1718
+ "[227Th]": 1541,
1719
+ "[192Ir]": 1542,
1720
+ "[133Xe]": 1543,
1721
+ "[81Kr]": 1544,
1722
+ "[95Zr]": 1545,
1723
+ "[240Pu]": 1546,
1724
+ "[54Mn]": 1547,
1725
+ "[103Ru]": 1548,
1726
+ "[95Nb]": 1549,
1727
+ "[109Cd]": 1550,
1728
+ "[141Ce]": 1551,
1729
+ "[85Kr]": 1552,
1730
+ "[110Ag]": 1553,
1731
+ "[58Co]": 1554,
1732
+ "[241Pu]": 1555,
1733
+ "[234Th]": 1556,
1734
+ "[140La]": 1557,
1735
+ "[63Ni]": 1558,
1736
+ "[152Eu]": 1559,
1737
+ "[132IH]": 1560,
1738
+ "[226Rn]": 1561,
1739
+ "[154Eu]": 1562,
1740
+ "[36ClH]": 1563,
1741
+ "[228Ac]": 1564,
1742
+ "[155Eu]": 1565,
1743
+ "[106Rh]": 1566,
1744
+ "[243Am]": 1567,
1745
+ "[227Ac]": 1568,
1746
+ "[243Cm]": 1569,
1747
+ "[236U]": 1570,
1748
+ "[144Pr]": 1571,
1749
+ "[232U]": 1572,
1750
+ "[32SH2]": 1573,
1751
+ "[88Y]": 1574,
1752
+ "[82BrH]": 1575,
1753
+ "[135IH]": 1576,
1754
+ "[242Cm]": 1577,
1755
+ "[115Cd]": 1578,
1756
+ "[242Pu]": 1579,
1757
+ "[46Sc]": 1580,
1758
+ "[56Mn]": 1581,
1759
+ "[234Pa]": 1582,
1760
+ "[41Ar]": 1583,
1761
+ "[147Nd]": 1584,
1762
+ "[187W]": 1585,
1763
+ "[151Sm]": 1586,
1764
+ "[59Ni]": 1587,
1765
+ "[233Pa]": 1588,
1766
+ "[52Mn]": 1589,
1767
+ "[94Nb]": 1590,
1768
+ "[219Rn]": 1591,
1769
+ "[236Pu]": 1592,
1770
+ "[13NH3]": 1593,
1771
+ "[93Zr]": 1594,
1772
+ "[51Cr+6]": 1595,
1773
+ "[TlH3]": 1596,
1774
+ "[123Xe]": 1597,
1775
+ "[160Tb]": 1598,
1776
+ "[170Tm]": 1599,
1777
+ "[182Ta]": 1600,
1778
+ "[175Yb]": 1601,
1779
+ "[93Mo]": 1602,
1780
+ "[143Ce]": 1603,
1781
+ "[191Os]": 1604,
1782
+ "[126IH]": 1605,
1783
+ "[48V]": 1606,
1784
+ "[113Cd]": 1607,
1785
+ "[47Sc]": 1608,
1786
+ "[181Hf]": 1609,
1787
+ "[185W]": 1610,
1788
+ "[143Pr]": 1611,
1789
+ "[191Pt]": 1612,
1790
+ "[181W]": 1613,
1791
+ "[33PH3]": 1614,
1792
+ "[97Ru]": 1615,
1793
+ "[97Tc]": 1616,
1794
+ "[111Ag]": 1617,
1795
+ "[169Er]": 1618,
1796
+ "[107Pd]": 1619,
1797
+ "[103Ru+2]": 1620,
1798
+ "[34SH2]": 1621,
1799
+ "[137Ce]": 1622,
1800
+ "[242Am]": 1623,
1801
+ "[117SnH2]": 1624,
1802
+ "[57Ni]": 1625,
1803
+ "[239U]": 1626,
1804
+ "[60Cu]": 1627,
1805
+ "[250Cf]": 1628,
1806
+ "[193Au]": 1629,
1807
+ "[69Zn]": 1630,
1808
+ "[55Co]": 1631,
1809
+ "[139Ce]": 1632,
1810
+ "[127Xe]": 1633,
1811
+ "[159Gd]": 1634,
1812
+ "[56Co]": 1635,
1813
+ "[177Hf]": 1636,
1814
+ "[244Pu]": 1637,
1815
+ "[38ClH]": 1638,
1816
+ "[142Pr]": 1639,
1817
+ "[199Hg]": 1640,
1818
+ "[179Hf]": 1641,
1819
+ "[178Hf]": 1642,
1820
+ "[237U]": 1643,
1821
+ "[156Eu]": 1644,
1822
+ "[157Eu]": 1645,
1823
+ "[105Ru]": 1646,
1824
+ "[171Tm]": 1647,
1825
+ "[199Au]": 1648,
1826
+ "[155Sm]": 1649,
1827
+ "[80BrH]": 1650,
1828
+ "[108Ag]": 1651,
1829
+ "[128IH]": 1652,
1830
+ "[48Sc]": 1653,
1831
+ "[45Ti]": 1654,
1832
+ "[176Lu]": 1655,
1833
+ "[121SnH2]": 1656,
1834
+ "[148Pm]": 1657,
1835
+ "[57Fe]": 1658,
1836
+ "[10BH3]": 1659,
1837
+ "[96Tc]": 1660,
1838
+ "[133IH]": 1661,
1839
+ "[143Pm]": 1662,
1840
+ "[105Rh]": 1663,
1841
+ "[130IH]": 1664,
1842
+ "[134IH]": 1665,
1843
+ "[131IH]": 1666,
1844
+ "[71Zn]": 1667,
1845
+ "[105Ag]": 1668,
1846
+ "[97Zr]": 1669,
1847
+ "[235Pu]": 1670,
1848
+ "[231Th]": 1671,
1849
+ "[109Pd]": 1672,
1850
+ "[93Y]": 1673,
1851
+ "[190Ir]": 1674,
1852
+ "[135Xe]": 1675,
1853
+ "[53Mn]": 1676,
1854
+ "[134Ce]": 1677,
1855
+ "[234Np]": 1678,
1856
+ "[240Am]": 1679,
1857
+ "[246Cf]": 1680,
1858
+ "[240Cm]": 1681,
1859
+ "[241Cm]": 1682,
1860
+ "[226Th]": 1683,
1861
+ "[39ClH]": 1684,
1862
+ "[229Th]": 1685,
1863
+ "[245Cm]": 1686,
1864
+ "[240U]": 1687,
1865
+ "[240Np]": 1688,
1866
+ "[249Cm]": 1689,
1867
+ "[243Pu]": 1690,
1868
+ "[145Pm]": 1691,
1869
+ "[199Pt]": 1692,
1870
+ "[246Bk]": 1693,
1871
+ "[193Pt]": 1694,
1872
+ "[230U]": 1695,
1873
+ "[250Cm]": 1696,
1874
+ "[44Ti]": 1697,
1875
+ "[175Hf]": 1698,
1876
+ "[254Fm]": 1699,
1877
+ "[255Fm]": 1700,
1878
+ "[257Fm]": 1701,
1879
+ "[92Y]": 1702,
1880
+ "[188Ir]": 1703,
1881
+ "[171Lu]": 1704,
1882
+ "[257Md]": 1705,
1883
+ "[247Bk]": 1706,
1884
+ "[121IH]": 1707,
1885
+ "[250Bk]": 1708,
1886
+ "[179Lu]": 1709,
1887
+ "[224Ac]": 1710,
1888
+ "[195Hg]": 1711,
1889
+ "[244Am]": 1712,
1890
+ "[246Pu]": 1713,
1891
+ "[194Au]": 1714,
1892
+ "[252Fm]": 1715,
1893
+ "[173Hf]": 1716,
1894
+ "[246Cm]": 1717,
1895
+ "[135Ce]": 1718,
1896
+ "[49Cr]": 1719,
1897
+ "[248Cf]": 1720,
1898
+ "[247Cm]": 1721,
1899
+ "[248Cm]": 1722,
1900
+ "[174Ta]": 1723,
1901
+ "[176Ta]": 1724,
1902
+ "[154Tb]": 1725,
1903
+ "[172Ta]": 1726,
1904
+ "[177Ta]": 1727,
1905
+ "[175Ta]": 1728,
1906
+ "[180Ta]": 1729,
1907
+ "[158Tb]": 1730,
1908
+ "[115Ag]": 1731,
1909
+ "[189Os]": 1732,
1910
+ "[251Cf]": 1733,
1911
+ "[145Pr]": 1734,
1912
+ "[147Pr]": 1735,
1913
+ "[76BrH]": 1736,
1914
+ "[102Rh]": 1737,
1915
+ "[238Np]": 1738,
1916
+ "[185Os]": 1739,
1917
+ "[246Am]": 1740,
1918
+ "[233Np]": 1741,
1919
+ "[166Dy]": 1742,
1920
+ "[254Es]": 1743,
1921
+ "[244Cf]": 1744,
1922
+ "[193Os]": 1745,
1923
+ "[245Am]": 1746,
1924
+ "[245Bk]": 1747,
1925
+ "[239Am]": 1748,
1926
+ "[238Am]": 1749,
1927
+ "[97Nb]": 1750,
1928
+ "[245Pu]": 1751,
1929
+ "[254Cf]": 1752,
1930
+ "[188W]": 1753,
1931
+ "[250Es]": 1754,
1932
+ "[251Es]": 1755,
1933
+ "[237Am]": 1756,
1934
+ "[182Hf]": 1757,
1935
+ "[258Md]": 1758,
1936
+ "[232Np]": 1759,
1937
+ "[238Cm]": 1760,
1938
+ "[60Fe]": 1761,
1939
+ "[109Pd+2]": 1762,
1940
+ "[234Pu]": 1763,
1941
+ "[141Ce+3]": 1764,
1942
+ "[136Nd]": 1765,
1943
+ "[136Pr]": 1766,
1944
+ "[173Ta]": 1767,
1945
+ "[110Ru]": 1768,
1946
+ "[147Tb]": 1769,
1947
+ "[253Fm]": 1770,
1948
+ "[139Nd]": 1771,
1949
+ "[178Re]": 1772,
1950
+ "[177Re]": 1773,
1951
+ "[200Au]": 1774,
1952
+ "[182Re]": 1775,
1953
+ "[156Tb]": 1776,
1954
+ "[155Tb]": 1777,
1955
+ "[157Tb]": 1778,
1956
+ "[161Tb]": 1779,
1957
+ "[161Ho]": 1780,
1958
+ "[167Tm]": 1781,
1959
+ "[173Lu]": 1782,
1960
+ "[179Ta]": 1783,
1961
+ "[171Er]": 1784,
1962
+ "[44Sc]": 1785,
1963
+ "[49Sc]": 1786,
1964
+ "[49V]": 1787,
1965
+ "[51Mn]": 1788,
1966
+ "[90Nb]": 1789,
1967
+ "[88Nb]": 1790,
1968
+ "[88Zr]": 1791,
1969
+ "[36SH2]": 1792,
1970
+ "[174Yb]": 1793,
1971
+ "[178Lu]": 1794,
1972
+ "[179W]": 1795,
1973
+ "[83BrH]": 1796,
1974
+ "[107Cd]": 1797,
1975
+ "[75BrH]": 1798,
1976
+ "[62Co]": 1799,
1977
+ "[48Cr]": 1800,
1978
+ "[63Zn]": 1801,
1979
+ "[102Ag]": 1802,
1980
+ "[154Sm]": 1803,
1981
+ "[168Er]": 1804,
1982
+ "[65Ni]": 1805,
1983
+ "[137La]": 1806,
1984
+ "[187Ir]": 1807,
1985
+ "[144Pm]": 1808,
1986
+ "[146Pm]": 1809,
1987
+ "[160Gd]": 1810,
1988
+ "[166Yb]": 1811,
1989
+ "[162Dy]": 1812,
1990
+ "[47V]": 1813,
1991
+ "[141Nd]": 1814,
1992
+ "[141Sm]": 1815,
1993
+ "[166Er]": 1816,
1994
+ "[150Sm]": 1817,
1995
+ "[146Eu]": 1818,
1996
+ "[149Eu]": 1819,
1997
+ "[174Lu]": 1820,
1998
+ "[17NH3]": 1821,
1999
+ "[102Ru]": 1822,
2000
+ "[170Hf]": 1823,
2001
+ "[188Pt]": 1824,
2002
+ "[61Ni]": 1825,
2003
+ "[56Ni]": 1826,
2004
+ "[149Gd]": 1827,
2005
+ "[151Gd]": 1828,
2006
+ "[141Pm]": 1829,
2007
+ "[147Gd]": 1830,
2008
+ "[146Gd]": 1831,
2009
+ "[161Er]": 1832,
2010
+ "[103Ag]": 1833,
2011
+ "[145Eu]": 1834,
2012
+ "[153Tb]": 1835,
2013
+ "[155Dy]": 1836,
2014
+ "[184Re]": 1837,
2015
+ "[180Os]": 1838,
2016
+ "[182Os]": 1839,
2017
+ "[186Pt]": 1840,
2018
+ "[181Os]": 1841,
2019
+ "[181Re]": 1842,
2020
+ "[151Tb]": 1843,
2021
+ "[178Ta]": 1844,
2022
+ "[178W]": 1845,
2023
+ "[189Pt]": 1846,
2024
+ "[194Hg]": 1847,
2025
+ "[145Sm]": 1848,
2026
+ "[150Tb]": 1849,
2027
+ "[132La]": 1850,
2028
+ "[158Gd]": 1851,
2029
+ "[104Ag]": 1852,
2030
+ "[193Hg]": 1853,
2031
+ "[94Ru]": 1854,
2032
+ "[137Pr]": 1855,
2033
+ "[155Ho]": 1856,
2034
+ "[117Cd]": 1857,
2035
+ "[99Ru]": 1858,
2036
+ "[146Nd]": 1859,
2037
+ "[218Rn]": 1860,
2038
+ "[95Y]": 1861,
2039
+ "[79Kr]": 1862,
2040
+ "[120IH]": 1863,
2041
+ "[138Pr]": 1864,
2042
+ "[100Pd]": 1865,
2043
+ "[166Tm]": 1866,
2044
+ "[90Mo]": 1867,
2045
+ "[151Nd]": 1868,
2046
+ "[231U]": 1869,
2047
+ "[138Nd]": 1870,
2048
+ "[89Nb]": 1871,
2049
+ "[98Nb]": 1872,
2050
+ "[162Ho]": 1873,
2051
+ "[142Sm]": 1874,
2052
+ "[186Ta]": 1875,
2053
+ "[104Tc]": 1876,
2054
+ "[184Ta]": 1877,
2055
+ "[185Ta]": 1878,
2056
+ "[170Er]": 1879,
2057
+ "[107Rh]": 1880,
2058
+ "[131La]": 1881,
2059
+ "[169Lu]": 1882,
2060
+ "[74BrH]": 1883,
2061
+ "[150Pm]": 1884,
2062
+ "[172Tm]": 1885,
2063
+ "[197Pt]": 1886,
2064
+ "[230Pu]": 1887,
2065
+ "[170Lu]": 1888,
2066
+ "[86Zr]": 1889,
2067
+ "[176W]": 1890,
2068
+ "[177W]": 1891,
2069
+ "[101Pd]": 1892,
2070
+ "[105Pd]": 1893,
2071
+ "[108Pd]": 1894,
2072
+ "[149Nd]": 1895,
2073
+ "[164Ho]": 1896,
2074
+ "[159Ho]": 1897,
2075
+ "[167Ho]": 1898,
2076
+ "[176Yb]": 1899,
2077
+ "[156Sm]": 1900,
2078
+ "[77BrH]": 1901,
2079
+ "[189Re]": 1902,
2080
+ "[99Rh]": 1903,
2081
+ "[100Rh]": 1904,
2082
+ "[151Pm]": 1905,
2083
+ "[232Pa]": 1906,
2084
+ "[228Pa]": 1907,
2085
+ "[230Pa]": 1908,
2086
+ "[66Ni]": 1909,
2087
+ "[194Os]": 1910,
2088
+ "[135La]": 1911,
2089
+ "[138La]": 1912,
2090
+ "[141La]": 1913,
2091
+ "[142La]": 1914,
2092
+ "[195Ir]": 1915,
2093
+ "[96Nb]": 1916,
2094
+ "[157Ho]": 1917,
2095
+ "[183Hf]": 1918,
2096
+ "[162Tm]": 1919,
2097
+ "[172Er]": 1920,
2098
+ "[148Eu]": 1921,
2099
+ "[150Eu]": 1922,
2100
+ "[15CH4]": 1923,
2101
+ "[89Kr]": 1924,
2102
+ "[143La]": 1925,
2103
+ "[58Ni]": 1926,
2104
+ "[61Co]": 1927,
2105
+ "[158Eu]": 1928,
2106
+ "[165Er]": 1929,
2107
+ "[167Yb]": 1930,
2108
+ "[173Tm]": 1931,
2109
+ "[175Tm]": 1932,
2110
+ "[172Hf]": 1933,
2111
+ "[172Lu]": 1934,
2112
+ "[93Tc]": 1935,
2113
+ "[177Yb]": 1936,
2114
+ "[124IH]": 1937,
2115
+ "[194Ir]": 1938,
2116
+ "[147Eu]": 1939,
2117
+ "[101Mo]": 1940,
2118
+ "[180Hf]": 1941,
2119
+ "[189Ir]": 1942,
2120
+ "[87Y]": 1943,
2121
+ "[43Sc]": 1944,
2122
+ "[195Au]": 1945,
2123
+ "[112Ag]": 1946,
2124
+ "[84BrH]": 1947,
2125
+ "[106Ag]": 1948,
2126
+ "[109Ag]": 1949,
2127
+ "[101Rh]": 1950,
2128
+ "[162Yb]": 1951,
2129
+ "[228Rn]": 1952,
2130
+ "[139Pr]": 1953,
2131
+ "[94Y]": 1954,
2132
+ "[201Au]": 1955,
2133
+ "[40PH3]": 1956,
2134
+ "[110Ag+]": 1957,
2135
+ "[104Cd]": 1958,
2136
+ "[133Ba+2]": 1959,
2137
+ "[226Ac]": 1960,
2138
+ "[145Gd]": 1961,
2139
+ "[186Ir]": 1962,
2140
+ "[184Ir]": 1963,
2141
+ "[224Rn]": 1964,
2142
+ "[185Ir]": 1965,
2143
+ "[182Ir]": 1966,
2144
+ "[184Hf]": 1967,
2145
+ "[200Pt]": 1968,
2146
+ "[227Pa]": 1969,
2147
+ "[178Yb]": 1970,
2148
+ "[72Br-]": 1971,
2149
+ "[72BrH]": 1972,
2150
+ "[248Am]": 1973,
2151
+ "[238Th]": 1974,
2152
+ "[161Gd]": 1975,
2153
+ "[35S-2]": 1976,
2154
+ "[107Ag]": 1977,
2155
+ "[FeH6-4]": 1978,
2156
+ "[89Sr]": 1979,
2157
+ "[SnH3-]": 1980,
2158
+ "[SeH3]": 1981,
2159
+ "[TeH3+]": 1982,
2160
+ "[SbH4+]": 1983,
2161
+ "[AsH4+]": 1984,
2162
+ "[4He]": 1985,
2163
+ "[AsH3-]": 1986,
2164
+ "[1HH]": 1987,
2165
+ "[3H+]": 1988,
2166
+ "[82Rb]": 1989,
2167
+ "[85Sr]": 1990,
2168
+ "[90Sr]": 1991,
2169
+ "[137Cs]": 1992,
2170
+ "[133Ba]": 1993,
2171
+ "[131Cs]": 1994,
2172
+ "[SbH5]": 1995,
2173
+ "[224Ra]": 1996,
2174
+ "[22Na]": 1997,
2175
+ "[210Bi]": 1998,
2176
+ "[214Bi]": 1999,
2177
+ "[228Ra]": 2000,
2178
+ "[127Sb]": 2001,
2179
+ "[136Cs]": 2002,
2180
+ "[125Sb]": 2003,
2181
+ "[134Cs]": 2004,
2182
+ "[140Ba]": 2005,
2183
+ "[45Ca]": 2006,
2184
+ "[206Pb]": 2007,
2185
+ "[207Pb]": 2008,
2186
+ "[24Na]": 2009,
2187
+ "[86Rb]": 2010,
2188
+ "[212Bi]": 2011,
2189
+ "[208Pb]": 2012,
2190
+ "[124Sb]": 2013,
2191
+ "[204Pb]": 2014,
2192
+ "[44K]": 2015,
2193
+ "[129Te]": 2016,
2194
+ "[113Sn]": 2017,
2195
+ "[204Tl]": 2018,
2196
+ "[87Sr]": 2019,
2197
+ "[208Tl]": 2020,
2198
+ "[87Rb]": 2021,
2199
+ "[47Ca]": 2022,
2200
+ "[135Cs]": 2023,
2201
+ "[216Po]": 2024,
2202
+ "[137Ba]": 2025,
2203
+ "[207Bi]": 2026,
2204
+ "[212Po]": 2027,
2205
+ "[79Se]": 2028,
2206
+ "[223Ra]": 2029,
2207
+ "[86Sr]": 2030,
2208
+ "[122Sb]": 2031,
2209
+ "[26Al]": 2032,
2210
+ "[32Si]": 2033,
2211
+ "[126Sn]": 2034,
2212
+ "[225Ra]": 2035,
2213
+ "[114In]": 2036,
2214
+ "[72Ga]": 2037,
2215
+ "[132Te]": 2038,
2216
+ "[10Be]": 2039,
2217
+ "[125Sn]": 2040,
2218
+ "[73As]": 2041,
2219
+ "[206Bi]": 2042,
2220
+ "[117Sn]": 2043,
2221
+ "[40Ca]": 2044,
2222
+ "[41Ca]": 2045,
2223
+ "[89Rb]": 2046,
2224
+ "[116In]": 2047,
2225
+ "[129Sb]": 2048,
2226
+ "[91Sr]": 2049,
2227
+ "[71Ge]": 2050,
2228
+ "[139Ba]": 2051,
2229
+ "[69Ga]": 2052,
2230
+ "[120Sb]": 2053,
2231
+ "[121Sn]": 2054,
2232
+ "[123Sn]": 2055,
2233
+ "[131Te]": 2056,
2234
+ "[77Ge]": 2057,
2235
+ "[135Ba]": 2058,
2236
+ "[82Sr]": 2059,
2237
+ "[43K]": 2060,
2238
+ "[131Ba]": 2061,
2239
+ "[92Sr]": 2062,
2240
+ "[88Rb]": 2063,
2241
+ "[129Cs]": 2064,
2242
+ "[144Cs]": 2065,
2243
+ "[127Cs]": 2066,
2244
+ "[200Tl]": 2067,
2245
+ "[202Tl]": 2068,
2246
+ "[141Ba]": 2069,
2247
+ "[117Sb]": 2070,
2248
+ "[116Sb]": 2071,
2249
+ "[78As]": 2072,
2250
+ "[131Sb]": 2073,
2251
+ "[126Sb]": 2074,
2252
+ "[128Sb]": 2075,
2253
+ "[130Sb]": 2076,
2254
+ "[67Ge]": 2077,
2255
+ "[68Ge]": 2078,
2256
+ "[78Ge]": 2079,
2257
+ "[66Ge]": 2080,
2258
+ "[223Fr]": 2081,
2259
+ "[132Cs]": 2082,
2260
+ "[125Cs]": 2083,
2261
+ "[138Cs]": 2084,
2262
+ "[133Te]": 2085,
2263
+ "[84Rb]": 2086,
2264
+ "[83Rb]": 2087,
2265
+ "[81Rb]": 2088,
2266
+ "[142Ba]": 2089,
2267
+ "[200Bi]": 2090,
2268
+ "[115Sb]": 2091,
2269
+ "[194Tl]": 2092,
2270
+ "[70Se]": 2093,
2271
+ "[112In]": 2094,
2272
+ "[118Sb]": 2095,
2273
+ "[70Ga]": 2096,
2274
+ "[27Mg]": 2097,
2275
+ "[202Bi]": 2098,
2276
+ "[83Se]": 2099,
2277
+ "[9Li]": 2100,
2278
+ "[69As]": 2101,
2279
+ "[79Rb]": 2102,
2280
+ "[81Sr]": 2103,
2281
+ "[83Sr]": 2104,
2282
+ "[78Se]": 2105,
2283
+ "[109In]": 2106,
2284
+ "[29Al]": 2107,
2285
+ "[118Sn]": 2108,
2286
+ "[117In]": 2109,
2287
+ "[119Sb]": 2110,
2288
+ "[114Sn]": 2111,
2289
+ "[138Ba]": 2112,
2290
+ "[69Ge]": 2113,
2291
+ "[73Ga]": 2114,
2292
+ "[74Ge]": 2115,
2293
+ "[206Tl]": 2116,
2294
+ "[199Tl]": 2117,
2295
+ "[130Cs]": 2118,
2296
+ "[28Mg]": 2119,
2297
+ "[116Te]": 2120,
2298
+ "[112Sn]": 2121,
2299
+ "[126Ba]": 2122,
2300
+ "[211Bi]": 2123,
2301
+ "[81Se]": 2124,
2302
+ "[127Sn]": 2125,
2303
+ "[143Cs]": 2126,
2304
+ "[134Te]": 2127,
2305
+ "[80Sr]": 2128,
2306
+ "[45K]": 2129,
2307
+ "[215Po]": 2130,
2308
+ "[207Po]": 2131,
2309
+ "[111Sn]": 2132,
2310
+ "[211Po]": 2133,
2311
+ "[128Ba]": 2134,
2312
+ "[198Tl]": 2135,
2313
+ "[227Ra]": 2136,
2314
+ "[213Po]": 2137,
2315
+ "[220Ra]": 2138,
2316
+ "[128Sn]": 2139,
2317
+ "[203Po]": 2140,
2318
+ "[205Po]": 2141,
2319
+ "[65Ga]": 2142,
2320
+ "[197Tl]": 2143,
2321
+ "[88Sr]": 2144,
2322
+ "[110In]": 2145,
2323
+ "[31Si]": 2146,
2324
+ "[201Bi]": 2147,
2325
+ "[121Te]": 2148,
2326
+ "[205Bi]": 2149,
2327
+ "[203Bi]": 2150,
2328
+ "[195Tl]": 2151,
2329
+ "[209Tl]": 2152,
2330
+ "[110Sn]": 2153,
2331
+ "[222Fr]": 2154,
2332
+ "[207At]": 2155,
2333
+ "[119In]": 2156,
2334
+ "[As@]": 2157,
2335
+ "[129IH]": 2158,
2336
+ "[157Dy]": 2159,
2337
+ "[111IH]": 2160,
2338
+ "[230Ra]": 2161,
2339
+ "[144Pr+3]": 2162,
2340
+ "[SiH3+]": 2163,
2341
+ "[3He]": 2164,
2342
+ "[AsH5]": 2165,
2343
+ "[72Se]": 2166,
2344
+ "[95Tc]": 2167,
2345
+ "[103Pd]": 2168,
2346
+ "[121Sn+2]": 2169,
2347
+ "[211Rn]": 2170,
2348
+ "[38SH2]": 2171,
2349
+ "[127IH]": 2172,
2350
+ "[74Br-]": 2173,
2351
+ "[133I-]": 2174,
2352
+ "[100Tc+4]": 2175,
2353
+ "[100Tc]": 2176,
2354
+ "[36Cl-]": 2177,
2355
+ "[89Y+3]": 2178,
2356
+ "[104Rh]": 2179,
2357
+ "[152Sm]": 2180,
2358
+ "[226Ra]": 2181,
2359
+ "[19FH]": 2182,
2360
+ "[104Pd]": 2183,
2361
+ "[148Gd]": 2184,
2362
+ "[157Lu]": 2185,
2363
+ "[33SH2]": 2186,
2364
+ "[121I-]": 2187,
2365
+ "[17FH]": 2188,
2366
+ "[71Se]": 2189,
2367
+ "[157Sm]": 2190,
2368
+ "[148Tb]": 2191,
2369
+ "[164Dy]": 2192,
2370
+ "[15OH2]": 2193,
2371
+ "[15O+]": 2194,
2372
+ "[39K]": 2195,
2373
+ "[40Ar]": 2196,
2374
+ "[50Cr+3]": 2197,
2375
+ "[50Cr]": 2198,
2376
+ "[52Ti]": 2199,
2377
+ "[103Pd+2]": 2200,
2378
+ "[130Ba]": 2201,
2379
+ "[142Pm]": 2202,
2380
+ "[153Gd+3]": 2203,
2381
+ "[151Eu]": 2204,
2382
+ "[103Rh]": 2205,
2383
+ "[124Xe]": 2206,
2384
+ "[152Tb]": 2207,
2385
+ "[17OH2]": 2208,
2386
+ "[20Ne]": 2209,
2387
+ "[52Fe]": 2210,
2388
+ "[94Zr+4]": 2211,
2389
+ "[94Zr]": 2212,
2390
+ "[149Pr]": 2213,
2391
+ "[16OH2]": 2214,
2392
+ "[53Cr+6]": 2215,
2393
+ "[53Cr]": 2216,
2394
+ "[81Br-]": 2217,
2395
+ "[112Pd]": 2218,
2396
+ "[125Xe]": 2219,
2397
+ "[155Gd]": 2220,
2398
+ "[157Gd]": 2221,
2399
+ "[168Yb]": 2222,
2400
+ "[184Os]": 2223,
2401
+ "[166Tb]": 2224,
2402
+ "[221Fr]": 2225,
2403
+ "[212Ra]": 2226,
2404
+ "[75Br-]": 2227,
2405
+ "[79Br-]": 2228,
2406
+ "[113Ag]": 2229,
2407
+ "[23Na]": 2230,
2408
+ "[34Cl-]": 2231,
2409
+ "[34ClH]": 2232,
2410
+ "[38Cl-]": 2233,
2411
+ "[56Fe]": 2234,
2412
+ "[68Cu]": 2235,
2413
+ "[77Br-]": 2236,
2414
+ "[90Zr+4]": 2237,
2415
+ "[90Zr]": 2238,
2416
+ "[102Pd]": 2239,
2417
+ "[154Eu+3]": 2240,
2418
+ "[57Mn]": 2241,
2419
+ "[165Tm]": 2242,
2420
+ "[152Dy]": 2243,
2421
+ "[217At]": 2244,
2422
+ "[77se]": 2245,
2423
+ "[13cH-]": 2246,
2424
+ "[122Te]": 2247,
2425
+ "[156Gd]": 2248,
2426
+ "[124Te]": 2249,
2427
+ "[53Ni]": 2250,
2428
+ "[131Xe]": 2251,
2429
+ "[174Hf+4]": 2252,
2430
+ "[174Hf]": 2253,
2431
+ "[76Se]": 2254,
2432
+ "[168Tm]": 2255,
2433
+ "[167Dy]": 2256,
2434
+ "[154Gd]": 2257,
2435
+ "[95Ru]": 2258,
2436
+ "[210At]": 2259,
2437
+ "[85Br]": 2260,
2438
+ "[59Co]": 2261,
2439
+ "[122Xe]": 2262,
2440
+ "[27Al]": 2263,
2441
+ "[54Cr]": 2264,
2442
+ "[198Hg]": 2265,
2443
+ "[85Rb+]": 2266,
2444
+ "[214Tl]": 2267,
2445
+ "[229Rn]": 2268,
2446
+ "[218Pb]": 2269,
2447
+ "[218Bi]": 2270,
2448
+ "[167Tm+3]": 2271,
2449
+ "[18o+]": 2272,
2450
+ "[P@@H+]": 2273,
2451
+ "[P@H+]": 2274,
2452
+ "[13N+]": 2275,
2453
+ "[212Pb+2]": 2276,
2454
+ "[217Bi]": 2277,
2455
+ "[249Cf+2]": 2278,
2456
+ "[18OH3+]": 2279,
2457
+ "[90Sr-]": 2280,
2458
+ "[Cf+3]": 2281,
2459
+ "[200Hg]": 2282,
2460
+ "[86Tc]": 2283,
2461
+ "[141Pr+3]": 2284,
2462
+ "[141Pr]": 2285,
2463
+ "[16nH]": 2286,
2464
+ "[14NH4+]": 2287,
2465
+ "[132Xe]": 2288,
2466
+ "[83Kr]": 2289,
2467
+ "[70Zn+2]": 2290,
2468
+ "[137Ba+2]": 2291,
2469
+ "[36Ar]": 2292,
2470
+ "[38Ar]": 2293,
2471
+ "[21Ne]": 2294,
2472
+ "[126Xe]": 2295,
2473
+ "[136Xe]": 2296,
2474
+ "[128Xe]": 2297,
2475
+ "[134Xe]": 2298,
2476
+ "[84Kr]": 2299,
2477
+ "[86Kr]": 2300,
2478
+ "[78Kr]": 2301,
2479
+ "[80Kr]": 2302,
2480
+ "[82Kr]": 2303,
2481
+ "[67Zn+2]": 2304,
2482
+ "[65Cu+2]": 2305,
2483
+ "[110Te]": 2306,
2484
+ "[58Fe+3]": 2307,
2485
+ "[142Nd]": 2308,
2486
+ "[38K]": 2309,
2487
+ "[198Au+3]": 2310,
2488
+ "[122IH]": 2311,
2489
+ "[38PH3]": 2312,
2490
+ "[130I-]": 2313,
2491
+ "[40K+]": 2314,
2492
+ "[38K+]": 2315,
2493
+ "[28Mg+2]": 2316,
2494
+ "[208Tl+]": 2317,
2495
+ "[13OH2]": 2318,
2496
+ "[198Bi]": 2319,
2497
+ "[192Bi]": 2320,
2498
+ "[194Bi]": 2321,
2499
+ "[196Bi]": 2322,
2500
+ "[132I-]": 2323,
2501
+ "[83Sr+2]": 2324,
2502
+ "[169Er+3]": 2325,
2503
+ "[122I-]": 2326,
2504
+ "[120I-]": 2327,
2505
+ "[92Sr+2]": 2328,
2506
+ "[126I-]": 2329,
2507
+ "[24Mg]": 2330,
2508
+ "[84Sr]": 2331,
2509
+ "[118Pd+2]": 2332,
2510
+ "[118Pd]": 2333,
2511
+ "[AsH4]": 2334,
2512
+ "[127I-]": 2335,
2513
+ "[9C-]": 2336,
2514
+ "[11CH3+]": 2337,
2515
+ "[17B]": 2338,
2516
+ "[7B]": 2339,
2517
+ "[4HH]": 2340,
2518
+ "[18C-]": 2341,
2519
+ "[22CH3-]": 2342,
2520
+ "[22CH4]": 2343,
2521
+ "[17C-]": 2344,
2522
+ "[15CH3]": 2345,
2523
+ "[16CH3]": 2346,
2524
+ "[11NH3]": 2347,
2525
+ "[21NH3]": 2348,
2526
+ "[11N-]": 2349,
2527
+ "[11NH]": 2350,
2528
+ "[16CH]": 2351,
2529
+ "[17CH2]": 2352,
2530
+ "[99Ru+2]": 2353,
2531
+ "[181Ta+2]": 2354,
2532
+ "[181Ta]": 2355,
2533
+ "[20CH]": 2356,
2534
+ "[32PH2]": 2357,
2535
+ "[55Fe+2]": 2358,
2536
+ "[SH3]": 2359,
2537
+ "[S@H]": 2360,
2538
+ "[UNK]": 2361
2539
+ },
2540
+ "merges": []
2541
+ }
2542
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[CLS]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[SEP]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[PAD]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[MASK]",
29
+ "lstrip": true,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "2361": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "extra_special_tokens": {},
47
+ "mask_token": "[MASK]",
48
+ "model_input_names": [
49
+ "input_ids",
50
+ "attention_mask"
51
+ ],
52
+ "model_max_length": 512,
53
+ "pad_token": "[PAD]",
54
+ "sep_token": "[SEP]",
55
+ "tokenizer_class": "PreTrainedTokenizerFast",
56
+ "unk_token": "[UNK]"
57
+ }