SagiPolaczek commited on
Commit
1937fef
1 Parent(s): 75e8c8b

Push model using huggingface_hub.

Browse files
tokenizer/bpe_tokenizer_trained_on_chembl_zinc_with_aug_4272372_samples_balanced_1_1.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/cell_attributes_tokenizer.json ADDED
@@ -0,0 +1,3988 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "<UNK>",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ },
15
+ {
16
+ "id": 1,
17
+ "content": "<PAD>",
18
+ "single_word": false,
19
+ "lstrip": false,
20
+ "rstrip": false,
21
+ "normalized": false,
22
+ "special": true
23
+ },
24
+ {
25
+ "id": 2,
26
+ "content": "<CLS>",
27
+ "single_word": false,
28
+ "lstrip": false,
29
+ "rstrip": false,
30
+ "normalized": false,
31
+ "special": true
32
+ },
33
+ {
34
+ "id": 3,
35
+ "content": "<SEP>",
36
+ "single_word": false,
37
+ "lstrip": false,
38
+ "rstrip": false,
39
+ "normalized": false,
40
+ "special": true
41
+ },
42
+ {
43
+ "id": 4,
44
+ "content": "<MASK>",
45
+ "single_word": false,
46
+ "lstrip": false,
47
+ "rstrip": false,
48
+ "normalized": false,
49
+ "special": true
50
+ },
51
+ {
52
+ "id": 5,
53
+ "content": "<EOS>",
54
+ "single_word": false,
55
+ "lstrip": false,
56
+ "rstrip": false,
57
+ "normalized": false,
58
+ "special": true
59
+ },
60
+ {
61
+ "id": 6,
62
+ "content": "<MOLECULAR_ENTITY>",
63
+ "single_word": false,
64
+ "lstrip": false,
65
+ "rstrip": false,
66
+ "normalized": false,
67
+ "special": true
68
+ },
69
+ {
70
+ "id": 7,
71
+ "content": "<GLOBAL_INTERACTION_ATTRIBUTES>",
72
+ "single_word": false,
73
+ "lstrip": false,
74
+ "rstrip": false,
75
+ "normalized": false,
76
+ "special": true
77
+ },
78
+ {
79
+ "id": 8,
80
+ "content": "<MOLECULAR_ENTITY_ANTIGEN>",
81
+ "single_word": false,
82
+ "lstrip": false,
83
+ "rstrip": false,
84
+ "normalized": false,
85
+ "special": true
86
+ },
87
+ {
88
+ "id": 9,
89
+ "content": "<MOLECULAR_ENTITY_EPITOPE>",
90
+ "single_word": false,
91
+ "lstrip": false,
92
+ "rstrip": false,
93
+ "normalized": false,
94
+ "special": true
95
+ },
96
+ {
97
+ "id": 10,
98
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN>",
99
+ "single_word": false,
100
+ "lstrip": false,
101
+ "rstrip": false,
102
+ "normalized": false,
103
+ "special": true
104
+ },
105
+ {
106
+ "id": 11,
107
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN>",
108
+ "single_word": false,
109
+ "lstrip": false,
110
+ "rstrip": false,
111
+ "normalized": false,
112
+ "special": true
113
+ },
114
+ {
115
+ "id": 12,
116
+ "content": "<MOLECULAR_ENTITY_TCR_ALPHA_CHAIN>",
117
+ "single_word": false,
118
+ "lstrip": false,
119
+ "rstrip": false,
120
+ "normalized": false,
121
+ "special": true
122
+ },
123
+ {
124
+ "id": 13,
125
+ "content": "<MOLECULAR_ENTITY_TCR_BETA_VDJ>",
126
+ "single_word": false,
127
+ "lstrip": false,
128
+ "rstrip": false,
129
+ "normalized": false,
130
+ "special": true
131
+ },
132
+ {
133
+ "id": 14,
134
+ "content": "<MOLECULAR_ENTITY_TCR_BETA_CDR3>",
135
+ "single_word": false,
136
+ "lstrip": false,
137
+ "rstrip": false,
138
+ "normalized": false,
139
+ "special": true
140
+ },
141
+ {
142
+ "id": 15,
143
+ "content": "<BINDING_AFFINITY_CLASS>",
144
+ "single_word": false,
145
+ "lstrip": false,
146
+ "rstrip": false,
147
+ "normalized": false,
148
+ "special": true
149
+ },
150
+ {
151
+ "id": 16,
152
+ "content": "<DECODER_START>",
153
+ "single_word": false,
154
+ "lstrip": false,
155
+ "rstrip": false,
156
+ "normalized": false,
157
+ "special": true
158
+ },
159
+ {
160
+ "id": 17,
161
+ "content": "<BINDING>",
162
+ "single_word": false,
163
+ "lstrip": false,
164
+ "rstrip": false,
165
+ "normalized": false,
166
+ "special": true
167
+ },
168
+ {
169
+ "id": 18,
170
+ "content": "<FILLIN>",
171
+ "single_word": false,
172
+ "lstrip": false,
173
+ "rstrip": false,
174
+ "normalized": false,
175
+ "special": true
176
+ },
177
+ {
178
+ "id": 19,
179
+ "content": "<REORDER>",
180
+ "single_word": false,
181
+ "lstrip": false,
182
+ "rstrip": false,
183
+ "normalized": false,
184
+ "special": true
185
+ },
186
+ {
187
+ "id": 20,
188
+ "content": "<TOAA>",
189
+ "single_word": false,
190
+ "lstrip": false,
191
+ "rstrip": false,
192
+ "normalized": false,
193
+ "special": true
194
+ },
195
+ {
196
+ "id": 21,
197
+ "content": "<ACTIVE>",
198
+ "single_word": false,
199
+ "lstrip": false,
200
+ "rstrip": false,
201
+ "normalized": false,
202
+ "special": true
203
+ },
204
+ {
205
+ "id": 22,
206
+ "content": "<GENESEQ>",
207
+ "single_word": false,
208
+ "lstrip": false,
209
+ "rstrip": false,
210
+ "normalized": false,
211
+ "special": true
212
+ },
213
+ {
214
+ "id": 23,
215
+ "content": "<INCREASE>",
216
+ "single_word": false,
217
+ "lstrip": false,
218
+ "rstrip": false,
219
+ "normalized": false,
220
+ "special": true
221
+ },
222
+ {
223
+ "id": 24,
224
+ "content": "<DECREASE>",
225
+ "single_word": false,
226
+ "lstrip": false,
227
+ "rstrip": false,
228
+ "normalized": false,
229
+ "special": true
230
+ },
231
+ {
232
+ "id": 25,
233
+ "content": "<STRUCTURE>",
234
+ "single_word": false,
235
+ "lstrip": false,
236
+ "rstrip": false,
237
+ "normalized": false,
238
+ "special": true
239
+ },
240
+ {
241
+ "id": 26,
242
+ "content": "<DISTANCE>",
243
+ "single_word": false,
244
+ "lstrip": false,
245
+ "rstrip": false,
246
+ "normalized": false,
247
+ "special": true
248
+ },
249
+ {
250
+ "id": 27,
251
+ "content": "<SOLUBILITY>",
252
+ "single_word": false,
253
+ "lstrip": false,
254
+ "rstrip": false,
255
+ "normalized": false,
256
+ "special": true
257
+ },
258
+ {
259
+ "id": 28,
260
+ "content": "<TOXICITY>",
261
+ "single_word": false,
262
+ "lstrip": false,
263
+ "rstrip": false,
264
+ "normalized": false,
265
+ "special": true
266
+ },
267
+ {
268
+ "id": 29,
269
+ "content": "<AB>",
270
+ "single_word": false,
271
+ "lstrip": false,
272
+ "rstrip": false,
273
+ "normalized": false,
274
+ "special": true
275
+ },
276
+ {
277
+ "id": 30,
278
+ "content": "<ISACTIVE>",
279
+ "single_word": false,
280
+ "lstrip": false,
281
+ "rstrip": false,
282
+ "normalized": false,
283
+ "special": true
284
+ },
285
+ {
286
+ "id": 31,
287
+ "content": "<ISSYNTHETIC>",
288
+ "single_word": false,
289
+ "lstrip": false,
290
+ "rstrip": false,
291
+ "normalized": false,
292
+ "special": true
293
+ },
294
+ {
295
+ "id": 32,
296
+ "content": "<PENETR>",
297
+ "single_word": false,
298
+ "lstrip": false,
299
+ "rstrip": false,
300
+ "normalized": false,
301
+ "special": true
302
+ },
303
+ {
304
+ "id": 33,
305
+ "content": "<ABSORPTION>",
306
+ "single_word": false,
307
+ "lstrip": false,
308
+ "rstrip": false,
309
+ "normalized": false,
310
+ "special": true
311
+ },
312
+ {
313
+ "id": 34,
314
+ "content": "<DISTRIBUTION>",
315
+ "single_word": false,
316
+ "lstrip": false,
317
+ "rstrip": false,
318
+ "normalized": false,
319
+ "special": true
320
+ },
321
+ {
322
+ "id": 35,
323
+ "content": "<METABOLISM>",
324
+ "single_word": false,
325
+ "lstrip": false,
326
+ "rstrip": false,
327
+ "normalized": false,
328
+ "special": true
329
+ },
330
+ {
331
+ "id": 36,
332
+ "content": "<EXCRETION>",
333
+ "single_word": false,
334
+ "lstrip": false,
335
+ "rstrip": false,
336
+ "normalized": false,
337
+ "special": true
338
+ },
339
+ {
340
+ "id": 37,
341
+ "content": "<FLUORESCENCE>",
342
+ "single_word": false,
343
+ "lstrip": false,
344
+ "rstrip": false,
345
+ "normalized": false,
346
+ "special": true
347
+ },
348
+ {
349
+ "id": 38,
350
+ "content": "<STABILITY>",
351
+ "single_word": false,
352
+ "lstrip": false,
353
+ "rstrip": false,
354
+ "normalized": false,
355
+ "special": true
356
+ },
357
+ {
358
+ "id": 39,
359
+ "content": "<DISORDER>",
360
+ "single_word": false,
361
+ "lstrip": false,
362
+ "rstrip": false,
363
+ "normalized": false,
364
+ "special": true
365
+ },
366
+ {
367
+ "id": 40,
368
+ "content": "<DISEASE>",
369
+ "single_word": false,
370
+ "lstrip": false,
371
+ "rstrip": false,
372
+ "normalized": false,
373
+ "special": true
374
+ },
375
+ {
376
+ "id": 41,
377
+ "content": "<BINARY>",
378
+ "single_word": false,
379
+ "lstrip": false,
380
+ "rstrip": false,
381
+ "normalized": false,
382
+ "special": true
383
+ },
384
+ {
385
+ "id": 42,
386
+ "content": "<REGRESSION>",
387
+ "single_word": false,
388
+ "lstrip": false,
389
+ "rstrip": false,
390
+ "normalized": false,
391
+ "special": true
392
+ },
393
+ {
394
+ "id": 43,
395
+ "content": "<ORGANISM>",
396
+ "single_word": false,
397
+ "lstrip": false,
398
+ "rstrip": false,
399
+ "normalized": false,
400
+ "special": true
401
+ },
402
+ {
403
+ "id": 44,
404
+ "content": "<0>",
405
+ "single_word": false,
406
+ "lstrip": false,
407
+ "rstrip": false,
408
+ "normalized": false,
409
+ "special": true
410
+ },
411
+ {
412
+ "id": 45,
413
+ "content": "<1>",
414
+ "single_word": false,
415
+ "lstrip": false,
416
+ "rstrip": false,
417
+ "normalized": false,
418
+ "special": true
419
+ },
420
+ {
421
+ "id": 46,
422
+ "content": "<2>",
423
+ "single_word": false,
424
+ "lstrip": false,
425
+ "rstrip": false,
426
+ "normalized": false,
427
+ "special": true
428
+ },
429
+ {
430
+ "id": 47,
431
+ "content": "<3>",
432
+ "single_word": false,
433
+ "lstrip": false,
434
+ "rstrip": false,
435
+ "normalized": false,
436
+ "special": true
437
+ },
438
+ {
439
+ "id": 48,
440
+ "content": "<4>",
441
+ "single_word": false,
442
+ "lstrip": false,
443
+ "rstrip": false,
444
+ "normalized": false,
445
+ "special": true
446
+ },
447
+ {
448
+ "id": 49,
449
+ "content": "<5>",
450
+ "single_word": false,
451
+ "lstrip": false,
452
+ "rstrip": false,
453
+ "normalized": false,
454
+ "special": true
455
+ },
456
+ {
457
+ "id": 50,
458
+ "content": "<6>",
459
+ "single_word": false,
460
+ "lstrip": false,
461
+ "rstrip": false,
462
+ "normalized": false,
463
+ "special": true
464
+ },
465
+ {
466
+ "id": 51,
467
+ "content": "<7>",
468
+ "single_word": false,
469
+ "lstrip": false,
470
+ "rstrip": false,
471
+ "normalized": false,
472
+ "special": true
473
+ },
474
+ {
475
+ "id": 52,
476
+ "content": "<8>",
477
+ "single_word": false,
478
+ "lstrip": false,
479
+ "rstrip": false,
480
+ "normalized": false,
481
+ "special": true
482
+ },
483
+ {
484
+ "id": 53,
485
+ "content": "<9>",
486
+ "single_word": false,
487
+ "lstrip": false,
488
+ "rstrip": false,
489
+ "normalized": false,
490
+ "special": true
491
+ },
492
+ {
493
+ "id": 54,
494
+ "content": "<.>",
495
+ "single_word": false,
496
+ "lstrip": false,
497
+ "rstrip": false,
498
+ "normalized": false,
499
+ "special": true
500
+ },
501
+ {
502
+ "id": 55,
503
+ "content": "<YES>",
504
+ "single_word": false,
505
+ "lstrip": false,
506
+ "rstrip": false,
507
+ "normalized": false,
508
+ "special": true
509
+ },
510
+ {
511
+ "id": 56,
512
+ "content": "<NO>",
513
+ "single_word": false,
514
+ "lstrip": false,
515
+ "rstrip": false,
516
+ "normalized": false,
517
+ "special": true
518
+ },
519
+ {
520
+ "id": 57,
521
+ "content": "<SENTINEL_ID_0>",
522
+ "single_word": false,
523
+ "lstrip": false,
524
+ "rstrip": false,
525
+ "normalized": false,
526
+ "special": true
527
+ },
528
+ {
529
+ "id": 58,
530
+ "content": "<SENTINEL_ID_1>",
531
+ "single_word": false,
532
+ "lstrip": false,
533
+ "rstrip": false,
534
+ "normalized": false,
535
+ "special": true
536
+ },
537
+ {
538
+ "id": 59,
539
+ "content": "<SENTINEL_ID_2>",
540
+ "single_word": false,
541
+ "lstrip": false,
542
+ "rstrip": false,
543
+ "normalized": false,
544
+ "special": true
545
+ },
546
+ {
547
+ "id": 60,
548
+ "content": "<SENTINEL_ID_3>",
549
+ "single_word": false,
550
+ "lstrip": false,
551
+ "rstrip": false,
552
+ "normalized": false,
553
+ "special": true
554
+ },
555
+ {
556
+ "id": 61,
557
+ "content": "<SENTINEL_ID_4>",
558
+ "single_word": false,
559
+ "lstrip": false,
560
+ "rstrip": false,
561
+ "normalized": false,
562
+ "special": true
563
+ },
564
+ {
565
+ "id": 62,
566
+ "content": "<SENTINEL_ID_5>",
567
+ "single_word": false,
568
+ "lstrip": false,
569
+ "rstrip": false,
570
+ "normalized": false,
571
+ "special": true
572
+ },
573
+ {
574
+ "id": 63,
575
+ "content": "<SENTINEL_ID_6>",
576
+ "single_word": false,
577
+ "lstrip": false,
578
+ "rstrip": false,
579
+ "normalized": false,
580
+ "special": true
581
+ },
582
+ {
583
+ "id": 64,
584
+ "content": "<SENTINEL_ID_7>",
585
+ "single_word": false,
586
+ "lstrip": false,
587
+ "rstrip": false,
588
+ "normalized": false,
589
+ "special": true
590
+ },
591
+ {
592
+ "id": 65,
593
+ "content": "<SENTINEL_ID_8>",
594
+ "single_word": false,
595
+ "lstrip": false,
596
+ "rstrip": false,
597
+ "normalized": false,
598
+ "special": true
599
+ },
600
+ {
601
+ "id": 66,
602
+ "content": "<SENTINEL_ID_9>",
603
+ "single_word": false,
604
+ "lstrip": false,
605
+ "rstrip": false,
606
+ "normalized": false,
607
+ "special": true
608
+ },
609
+ {
610
+ "id": 67,
611
+ "content": "<SENTINEL_ID_10>",
612
+ "single_word": false,
613
+ "lstrip": false,
614
+ "rstrip": false,
615
+ "normalized": false,
616
+ "special": true
617
+ },
618
+ {
619
+ "id": 68,
620
+ "content": "<SENTINEL_ID_11>",
621
+ "single_word": false,
622
+ "lstrip": false,
623
+ "rstrip": false,
624
+ "normalized": false,
625
+ "special": true
626
+ },
627
+ {
628
+ "id": 69,
629
+ "content": "<SENTINEL_ID_12>",
630
+ "single_word": false,
631
+ "lstrip": false,
632
+ "rstrip": false,
633
+ "normalized": false,
634
+ "special": true
635
+ },
636
+ {
637
+ "id": 70,
638
+ "content": "<SENTINEL_ID_13>",
639
+ "single_word": false,
640
+ "lstrip": false,
641
+ "rstrip": false,
642
+ "normalized": false,
643
+ "special": true
644
+ },
645
+ {
646
+ "id": 71,
647
+ "content": "<SENTINEL_ID_14>",
648
+ "single_word": false,
649
+ "lstrip": false,
650
+ "rstrip": false,
651
+ "normalized": false,
652
+ "special": true
653
+ },
654
+ {
655
+ "id": 72,
656
+ "content": "<SENTINEL_ID_15>",
657
+ "single_word": false,
658
+ "lstrip": false,
659
+ "rstrip": false,
660
+ "normalized": false,
661
+ "special": true
662
+ },
663
+ {
664
+ "id": 73,
665
+ "content": "<SENTINEL_ID_16>",
666
+ "single_word": false,
667
+ "lstrip": false,
668
+ "rstrip": false,
669
+ "normalized": false,
670
+ "special": true
671
+ },
672
+ {
673
+ "id": 74,
674
+ "content": "<SENTINEL_ID_17>",
675
+ "single_word": false,
676
+ "lstrip": false,
677
+ "rstrip": false,
678
+ "normalized": false,
679
+ "special": true
680
+ },
681
+ {
682
+ "id": 75,
683
+ "content": "<SENTINEL_ID_18>",
684
+ "single_word": false,
685
+ "lstrip": false,
686
+ "rstrip": false,
687
+ "normalized": false,
688
+ "special": true
689
+ },
690
+ {
691
+ "id": 76,
692
+ "content": "<SENTINEL_ID_19>",
693
+ "single_word": false,
694
+ "lstrip": false,
695
+ "rstrip": false,
696
+ "normalized": false,
697
+ "special": true
698
+ },
699
+ {
700
+ "id": 77,
701
+ "content": "<SENTINEL_ID_20>",
702
+ "single_word": false,
703
+ "lstrip": false,
704
+ "rstrip": false,
705
+ "normalized": false,
706
+ "special": true
707
+ },
708
+ {
709
+ "id": 78,
710
+ "content": "<SENTINEL_ID_21>",
711
+ "single_word": false,
712
+ "lstrip": false,
713
+ "rstrip": false,
714
+ "normalized": false,
715
+ "special": true
716
+ },
717
+ {
718
+ "id": 79,
719
+ "content": "<SENTINEL_ID_22>",
720
+ "single_word": false,
721
+ "lstrip": false,
722
+ "rstrip": false,
723
+ "normalized": false,
724
+ "special": true
725
+ },
726
+ {
727
+ "id": 80,
728
+ "content": "<SENTINEL_ID_23>",
729
+ "single_word": false,
730
+ "lstrip": false,
731
+ "rstrip": false,
732
+ "normalized": false,
733
+ "special": true
734
+ },
735
+ {
736
+ "id": 81,
737
+ "content": "<SENTINEL_ID_24>",
738
+ "single_word": false,
739
+ "lstrip": false,
740
+ "rstrip": false,
741
+ "normalized": false,
742
+ "special": true
743
+ },
744
+ {
745
+ "id": 82,
746
+ "content": "<SENTINEL_ID_25>",
747
+ "single_word": false,
748
+ "lstrip": false,
749
+ "rstrip": false,
750
+ "normalized": false,
751
+ "special": true
752
+ },
753
+ {
754
+ "id": 83,
755
+ "content": "<SENTINEL_ID_26>",
756
+ "single_word": false,
757
+ "lstrip": false,
758
+ "rstrip": false,
759
+ "normalized": false,
760
+ "special": true
761
+ },
762
+ {
763
+ "id": 84,
764
+ "content": "<SENTINEL_ID_27>",
765
+ "single_word": false,
766
+ "lstrip": false,
767
+ "rstrip": false,
768
+ "normalized": false,
769
+ "special": true
770
+ },
771
+ {
772
+ "id": 85,
773
+ "content": "<SENTINEL_ID_28>",
774
+ "single_word": false,
775
+ "lstrip": false,
776
+ "rstrip": false,
777
+ "normalized": false,
778
+ "special": true
779
+ },
780
+ {
781
+ "id": 86,
782
+ "content": "<SENTINEL_ID_29>",
783
+ "single_word": false,
784
+ "lstrip": false,
785
+ "rstrip": false,
786
+ "normalized": false,
787
+ "special": true
788
+ },
789
+ {
790
+ "id": 87,
791
+ "content": "<SENTINEL_ID_30>",
792
+ "single_word": false,
793
+ "lstrip": false,
794
+ "rstrip": false,
795
+ "normalized": false,
796
+ "special": true
797
+ },
798
+ {
799
+ "id": 88,
800
+ "content": "<SENTINEL_ID_31>",
801
+ "single_word": false,
802
+ "lstrip": false,
803
+ "rstrip": false,
804
+ "normalized": false,
805
+ "special": true
806
+ },
807
+ {
808
+ "id": 89,
809
+ "content": "<SENTINEL_ID_32>",
810
+ "single_word": false,
811
+ "lstrip": false,
812
+ "rstrip": false,
813
+ "normalized": false,
814
+ "special": true
815
+ },
816
+ {
817
+ "id": 90,
818
+ "content": "<SENTINEL_ID_33>",
819
+ "single_word": false,
820
+ "lstrip": false,
821
+ "rstrip": false,
822
+ "normalized": false,
823
+ "special": true
824
+ },
825
+ {
826
+ "id": 91,
827
+ "content": "<SENTINEL_ID_34>",
828
+ "single_word": false,
829
+ "lstrip": false,
830
+ "rstrip": false,
831
+ "normalized": false,
832
+ "special": true
833
+ },
834
+ {
835
+ "id": 92,
836
+ "content": "<SENTINEL_ID_35>",
837
+ "single_word": false,
838
+ "lstrip": false,
839
+ "rstrip": false,
840
+ "normalized": false,
841
+ "special": true
842
+ },
843
+ {
844
+ "id": 93,
845
+ "content": "<SENTINEL_ID_36>",
846
+ "single_word": false,
847
+ "lstrip": false,
848
+ "rstrip": false,
849
+ "normalized": false,
850
+ "special": true
851
+ },
852
+ {
853
+ "id": 94,
854
+ "content": "<SENTINEL_ID_37>",
855
+ "single_word": false,
856
+ "lstrip": false,
857
+ "rstrip": false,
858
+ "normalized": false,
859
+ "special": true
860
+ },
861
+ {
862
+ "id": 95,
863
+ "content": "<SENTINEL_ID_38>",
864
+ "single_word": false,
865
+ "lstrip": false,
866
+ "rstrip": false,
867
+ "normalized": false,
868
+ "special": true
869
+ },
870
+ {
871
+ "id": 96,
872
+ "content": "<SENTINEL_ID_39>",
873
+ "single_word": false,
874
+ "lstrip": false,
875
+ "rstrip": false,
876
+ "normalized": false,
877
+ "special": true
878
+ },
879
+ {
880
+ "id": 97,
881
+ "content": "<SENTINEL_ID_40>",
882
+ "single_word": false,
883
+ "lstrip": false,
884
+ "rstrip": false,
885
+ "normalized": false,
886
+ "special": true
887
+ },
888
+ {
889
+ "id": 98,
890
+ "content": "<SENTINEL_ID_41>",
891
+ "single_word": false,
892
+ "lstrip": false,
893
+ "rstrip": false,
894
+ "normalized": false,
895
+ "special": true
896
+ },
897
+ {
898
+ "id": 99,
899
+ "content": "<SENTINEL_ID_42>",
900
+ "single_word": false,
901
+ "lstrip": false,
902
+ "rstrip": false,
903
+ "normalized": false,
904
+ "special": true
905
+ },
906
+ {
907
+ "id": 100,
908
+ "content": "<SENTINEL_ID_43>",
909
+ "single_word": false,
910
+ "lstrip": false,
911
+ "rstrip": false,
912
+ "normalized": false,
913
+ "special": true
914
+ },
915
+ {
916
+ "id": 101,
917
+ "content": "<SENTINEL_ID_44>",
918
+ "single_word": false,
919
+ "lstrip": false,
920
+ "rstrip": false,
921
+ "normalized": false,
922
+ "special": true
923
+ },
924
+ {
925
+ "id": 102,
926
+ "content": "<SENTINEL_ID_45>",
927
+ "single_word": false,
928
+ "lstrip": false,
929
+ "rstrip": false,
930
+ "normalized": false,
931
+ "special": true
932
+ },
933
+ {
934
+ "id": 103,
935
+ "content": "<SENTINEL_ID_46>",
936
+ "single_word": false,
937
+ "lstrip": false,
938
+ "rstrip": false,
939
+ "normalized": false,
940
+ "special": true
941
+ },
942
+ {
943
+ "id": 104,
944
+ "content": "<SENTINEL_ID_47>",
945
+ "single_word": false,
946
+ "lstrip": false,
947
+ "rstrip": false,
948
+ "normalized": false,
949
+ "special": true
950
+ },
951
+ {
952
+ "id": 105,
953
+ "content": "<SENTINEL_ID_48>",
954
+ "single_word": false,
955
+ "lstrip": false,
956
+ "rstrip": false,
957
+ "normalized": false,
958
+ "special": true
959
+ },
960
+ {
961
+ "id": 106,
962
+ "content": "<SENTINEL_ID_49>",
963
+ "single_word": false,
964
+ "lstrip": false,
965
+ "rstrip": false,
966
+ "normalized": false,
967
+ "special": true
968
+ },
969
+ {
970
+ "id": 107,
971
+ "content": "<SENTINEL_ID_50>",
972
+ "single_word": false,
973
+ "lstrip": false,
974
+ "rstrip": false,
975
+ "normalized": false,
976
+ "special": true
977
+ },
978
+ {
979
+ "id": 108,
980
+ "content": "<SENTINEL_ID_51>",
981
+ "single_word": false,
982
+ "lstrip": false,
983
+ "rstrip": false,
984
+ "normalized": false,
985
+ "special": true
986
+ },
987
+ {
988
+ "id": 109,
989
+ "content": "<SENTINEL_ID_52>",
990
+ "single_word": false,
991
+ "lstrip": false,
992
+ "rstrip": false,
993
+ "normalized": false,
994
+ "special": true
995
+ },
996
+ {
997
+ "id": 110,
998
+ "content": "<SENTINEL_ID_53>",
999
+ "single_word": false,
1000
+ "lstrip": false,
1001
+ "rstrip": false,
1002
+ "normalized": false,
1003
+ "special": true
1004
+ },
1005
+ {
1006
+ "id": 111,
1007
+ "content": "<SENTINEL_ID_54>",
1008
+ "single_word": false,
1009
+ "lstrip": false,
1010
+ "rstrip": false,
1011
+ "normalized": false,
1012
+ "special": true
1013
+ },
1014
+ {
1015
+ "id": 112,
1016
+ "content": "<SENTINEL_ID_55>",
1017
+ "single_word": false,
1018
+ "lstrip": false,
1019
+ "rstrip": false,
1020
+ "normalized": false,
1021
+ "special": true
1022
+ },
1023
+ {
1024
+ "id": 113,
1025
+ "content": "<SENTINEL_ID_56>",
1026
+ "single_word": false,
1027
+ "lstrip": false,
1028
+ "rstrip": false,
1029
+ "normalized": false,
1030
+ "special": true
1031
+ },
1032
+ {
1033
+ "id": 114,
1034
+ "content": "<SENTINEL_ID_57>",
1035
+ "single_word": false,
1036
+ "lstrip": false,
1037
+ "rstrip": false,
1038
+ "normalized": false,
1039
+ "special": true
1040
+ },
1041
+ {
1042
+ "id": 115,
1043
+ "content": "<SENTINEL_ID_58>",
1044
+ "single_word": false,
1045
+ "lstrip": false,
1046
+ "rstrip": false,
1047
+ "normalized": false,
1048
+ "special": true
1049
+ },
1050
+ {
1051
+ "id": 116,
1052
+ "content": "<SENTINEL_ID_59>",
1053
+ "single_word": false,
1054
+ "lstrip": false,
1055
+ "rstrip": false,
1056
+ "normalized": false,
1057
+ "special": true
1058
+ },
1059
+ {
1060
+ "id": 117,
1061
+ "content": "<SENTINEL_ID_60>",
1062
+ "single_word": false,
1063
+ "lstrip": false,
1064
+ "rstrip": false,
1065
+ "normalized": false,
1066
+ "special": true
1067
+ },
1068
+ {
1069
+ "id": 118,
1070
+ "content": "<SENTINEL_ID_61>",
1071
+ "single_word": false,
1072
+ "lstrip": false,
1073
+ "rstrip": false,
1074
+ "normalized": false,
1075
+ "special": true
1076
+ },
1077
+ {
1078
+ "id": 119,
1079
+ "content": "<SENTINEL_ID_62>",
1080
+ "single_word": false,
1081
+ "lstrip": false,
1082
+ "rstrip": false,
1083
+ "normalized": false,
1084
+ "special": true
1085
+ },
1086
+ {
1087
+ "id": 120,
1088
+ "content": "<SENTINEL_ID_63>",
1089
+ "single_word": false,
1090
+ "lstrip": false,
1091
+ "rstrip": false,
1092
+ "normalized": false,
1093
+ "special": true
1094
+ },
1095
+ {
1096
+ "id": 121,
1097
+ "content": "<SENTINEL_ID_64>",
1098
+ "single_word": false,
1099
+ "lstrip": false,
1100
+ "rstrip": false,
1101
+ "normalized": false,
1102
+ "special": true
1103
+ },
1104
+ {
1105
+ "id": 122,
1106
+ "content": "<SENTINEL_ID_65>",
1107
+ "single_word": false,
1108
+ "lstrip": false,
1109
+ "rstrip": false,
1110
+ "normalized": false,
1111
+ "special": true
1112
+ },
1113
+ {
1114
+ "id": 123,
1115
+ "content": "<SENTINEL_ID_66>",
1116
+ "single_word": false,
1117
+ "lstrip": false,
1118
+ "rstrip": false,
1119
+ "normalized": false,
1120
+ "special": true
1121
+ },
1122
+ {
1123
+ "id": 124,
1124
+ "content": "<SENTINEL_ID_67>",
1125
+ "single_word": false,
1126
+ "lstrip": false,
1127
+ "rstrip": false,
1128
+ "normalized": false,
1129
+ "special": true
1130
+ },
1131
+ {
1132
+ "id": 125,
1133
+ "content": "<SENTINEL_ID_68>",
1134
+ "single_word": false,
1135
+ "lstrip": false,
1136
+ "rstrip": false,
1137
+ "normalized": false,
1138
+ "special": true
1139
+ },
1140
+ {
1141
+ "id": 126,
1142
+ "content": "<SENTINEL_ID_69>",
1143
+ "single_word": false,
1144
+ "lstrip": false,
1145
+ "rstrip": false,
1146
+ "normalized": false,
1147
+ "special": true
1148
+ },
1149
+ {
1150
+ "id": 127,
1151
+ "content": "<SENTINEL_ID_70>",
1152
+ "single_word": false,
1153
+ "lstrip": false,
1154
+ "rstrip": false,
1155
+ "normalized": false,
1156
+ "special": true
1157
+ },
1158
+ {
1159
+ "id": 128,
1160
+ "content": "<SENTINEL_ID_71>",
1161
+ "single_word": false,
1162
+ "lstrip": false,
1163
+ "rstrip": false,
1164
+ "normalized": false,
1165
+ "special": true
1166
+ },
1167
+ {
1168
+ "id": 129,
1169
+ "content": "<SENTINEL_ID_72>",
1170
+ "single_word": false,
1171
+ "lstrip": false,
1172
+ "rstrip": false,
1173
+ "normalized": false,
1174
+ "special": true
1175
+ },
1176
+ {
1177
+ "id": 130,
1178
+ "content": "<SENTINEL_ID_73>",
1179
+ "single_word": false,
1180
+ "lstrip": false,
1181
+ "rstrip": false,
1182
+ "normalized": false,
1183
+ "special": true
1184
+ },
1185
+ {
1186
+ "id": 131,
1187
+ "content": "<SENTINEL_ID_74>",
1188
+ "single_word": false,
1189
+ "lstrip": false,
1190
+ "rstrip": false,
1191
+ "normalized": false,
1192
+ "special": true
1193
+ },
1194
+ {
1195
+ "id": 132,
1196
+ "content": "<SENTINEL_ID_75>",
1197
+ "single_word": false,
1198
+ "lstrip": false,
1199
+ "rstrip": false,
1200
+ "normalized": false,
1201
+ "special": true
1202
+ },
1203
+ {
1204
+ "id": 133,
1205
+ "content": "<SENTINEL_ID_76>",
1206
+ "single_word": false,
1207
+ "lstrip": false,
1208
+ "rstrip": false,
1209
+ "normalized": false,
1210
+ "special": true
1211
+ },
1212
+ {
1213
+ "id": 134,
1214
+ "content": "<SENTINEL_ID_77>",
1215
+ "single_word": false,
1216
+ "lstrip": false,
1217
+ "rstrip": false,
1218
+ "normalized": false,
1219
+ "special": true
1220
+ },
1221
+ {
1222
+ "id": 135,
1223
+ "content": "<SENTINEL_ID_78>",
1224
+ "single_word": false,
1225
+ "lstrip": false,
1226
+ "rstrip": false,
1227
+ "normalized": false,
1228
+ "special": true
1229
+ },
1230
+ {
1231
+ "id": 136,
1232
+ "content": "<SENTINEL_ID_79>",
1233
+ "single_word": false,
1234
+ "lstrip": false,
1235
+ "rstrip": false,
1236
+ "normalized": false,
1237
+ "special": true
1238
+ },
1239
+ {
1240
+ "id": 137,
1241
+ "content": "<SENTINEL_ID_80>",
1242
+ "single_word": false,
1243
+ "lstrip": false,
1244
+ "rstrip": false,
1245
+ "normalized": false,
1246
+ "special": true
1247
+ },
1248
+ {
1249
+ "id": 138,
1250
+ "content": "<SENTINEL_ID_81>",
1251
+ "single_word": false,
1252
+ "lstrip": false,
1253
+ "rstrip": false,
1254
+ "normalized": false,
1255
+ "special": true
1256
+ },
1257
+ {
1258
+ "id": 139,
1259
+ "content": "<SENTINEL_ID_82>",
1260
+ "single_word": false,
1261
+ "lstrip": false,
1262
+ "rstrip": false,
1263
+ "normalized": false,
1264
+ "special": true
1265
+ },
1266
+ {
1267
+ "id": 140,
1268
+ "content": "<SENTINEL_ID_83>",
1269
+ "single_word": false,
1270
+ "lstrip": false,
1271
+ "rstrip": false,
1272
+ "normalized": false,
1273
+ "special": true
1274
+ },
1275
+ {
1276
+ "id": 141,
1277
+ "content": "<SENTINEL_ID_84>",
1278
+ "single_word": false,
1279
+ "lstrip": false,
1280
+ "rstrip": false,
1281
+ "normalized": false,
1282
+ "special": true
1283
+ },
1284
+ {
1285
+ "id": 142,
1286
+ "content": "<SENTINEL_ID_85>",
1287
+ "single_word": false,
1288
+ "lstrip": false,
1289
+ "rstrip": false,
1290
+ "normalized": false,
1291
+ "special": true
1292
+ },
1293
+ {
1294
+ "id": 143,
1295
+ "content": "<SENTINEL_ID_86>",
1296
+ "single_word": false,
1297
+ "lstrip": false,
1298
+ "rstrip": false,
1299
+ "normalized": false,
1300
+ "special": true
1301
+ },
1302
+ {
1303
+ "id": 144,
1304
+ "content": "<SENTINEL_ID_87>",
1305
+ "single_word": false,
1306
+ "lstrip": false,
1307
+ "rstrip": false,
1308
+ "normalized": false,
1309
+ "special": true
1310
+ },
1311
+ {
1312
+ "id": 145,
1313
+ "content": "<SENTINEL_ID_88>",
1314
+ "single_word": false,
1315
+ "lstrip": false,
1316
+ "rstrip": false,
1317
+ "normalized": false,
1318
+ "special": true
1319
+ },
1320
+ {
1321
+ "id": 146,
1322
+ "content": "<SENTINEL_ID_89>",
1323
+ "single_word": false,
1324
+ "lstrip": false,
1325
+ "rstrip": false,
1326
+ "normalized": false,
1327
+ "special": true
1328
+ },
1329
+ {
1330
+ "id": 147,
1331
+ "content": "<SENTINEL_ID_90>",
1332
+ "single_word": false,
1333
+ "lstrip": false,
1334
+ "rstrip": false,
1335
+ "normalized": false,
1336
+ "special": true
1337
+ },
1338
+ {
1339
+ "id": 148,
1340
+ "content": "<SENTINEL_ID_91>",
1341
+ "single_word": false,
1342
+ "lstrip": false,
1343
+ "rstrip": false,
1344
+ "normalized": false,
1345
+ "special": true
1346
+ },
1347
+ {
1348
+ "id": 149,
1349
+ "content": "<SENTINEL_ID_92>",
1350
+ "single_word": false,
1351
+ "lstrip": false,
1352
+ "rstrip": false,
1353
+ "normalized": false,
1354
+ "special": true
1355
+ },
1356
+ {
1357
+ "id": 150,
1358
+ "content": "<SENTINEL_ID_93>",
1359
+ "single_word": false,
1360
+ "lstrip": false,
1361
+ "rstrip": false,
1362
+ "normalized": false,
1363
+ "special": true
1364
+ },
1365
+ {
1366
+ "id": 151,
1367
+ "content": "<SENTINEL_ID_94>",
1368
+ "single_word": false,
1369
+ "lstrip": false,
1370
+ "rstrip": false,
1371
+ "normalized": false,
1372
+ "special": true
1373
+ },
1374
+ {
1375
+ "id": 152,
1376
+ "content": "<SENTINEL_ID_95>",
1377
+ "single_word": false,
1378
+ "lstrip": false,
1379
+ "rstrip": false,
1380
+ "normalized": false,
1381
+ "special": true
1382
+ },
1383
+ {
1384
+ "id": 153,
1385
+ "content": "<SENTINEL_ID_96>",
1386
+ "single_word": false,
1387
+ "lstrip": false,
1388
+ "rstrip": false,
1389
+ "normalized": false,
1390
+ "special": true
1391
+ },
1392
+ {
1393
+ "id": 154,
1394
+ "content": "<SENTINEL_ID_97>",
1395
+ "single_word": false,
1396
+ "lstrip": false,
1397
+ "rstrip": false,
1398
+ "normalized": false,
1399
+ "special": true
1400
+ },
1401
+ {
1402
+ "id": 155,
1403
+ "content": "<SENTINEL_ID_98>",
1404
+ "single_word": false,
1405
+ "lstrip": false,
1406
+ "rstrip": false,
1407
+ "normalized": false,
1408
+ "special": true
1409
+ },
1410
+ {
1411
+ "id": 156,
1412
+ "content": "<SENTINEL_ID_99>",
1413
+ "single_word": false,
1414
+ "lstrip": false,
1415
+ "rstrip": false,
1416
+ "normalized": false,
1417
+ "special": true
1418
+ },
1419
+ {
1420
+ "id": 157,
1421
+ "content": "<SENTINEL_ID_100>",
1422
+ "single_word": false,
1423
+ "lstrip": false,
1424
+ "rstrip": false,
1425
+ "normalized": false,
1426
+ "special": true
1427
+ },
1428
+ {
1429
+ "id": 158,
1430
+ "content": "<SENTINEL_ID_101>",
1431
+ "single_word": false,
1432
+ "lstrip": false,
1433
+ "rstrip": false,
1434
+ "normalized": false,
1435
+ "special": true
1436
+ },
1437
+ {
1438
+ "id": 159,
1439
+ "content": "<SENTINEL_ID_102>",
1440
+ "single_word": false,
1441
+ "lstrip": false,
1442
+ "rstrip": false,
1443
+ "normalized": false,
1444
+ "special": true
1445
+ },
1446
+ {
1447
+ "id": 160,
1448
+ "content": "<SENTINEL_ID_103>",
1449
+ "single_word": false,
1450
+ "lstrip": false,
1451
+ "rstrip": false,
1452
+ "normalized": false,
1453
+ "special": true
1454
+ },
1455
+ {
1456
+ "id": 161,
1457
+ "content": "<SENTINEL_ID_104>",
1458
+ "single_word": false,
1459
+ "lstrip": false,
1460
+ "rstrip": false,
1461
+ "normalized": false,
1462
+ "special": true
1463
+ },
1464
+ {
1465
+ "id": 162,
1466
+ "content": "<SENTINEL_ID_105>",
1467
+ "single_word": false,
1468
+ "lstrip": false,
1469
+ "rstrip": false,
1470
+ "normalized": false,
1471
+ "special": true
1472
+ },
1473
+ {
1474
+ "id": 163,
1475
+ "content": "<SENTINEL_ID_106>",
1476
+ "single_word": false,
1477
+ "lstrip": false,
1478
+ "rstrip": false,
1479
+ "normalized": false,
1480
+ "special": true
1481
+ },
1482
+ {
1483
+ "id": 164,
1484
+ "content": "<SENTINEL_ID_107>",
1485
+ "single_word": false,
1486
+ "lstrip": false,
1487
+ "rstrip": false,
1488
+ "normalized": false,
1489
+ "special": true
1490
+ },
1491
+ {
1492
+ "id": 165,
1493
+ "content": "<SENTINEL_ID_108>",
1494
+ "single_word": false,
1495
+ "lstrip": false,
1496
+ "rstrip": false,
1497
+ "normalized": false,
1498
+ "special": true
1499
+ },
1500
+ {
1501
+ "id": 166,
1502
+ "content": "<SENTINEL_ID_109>",
1503
+ "single_word": false,
1504
+ "lstrip": false,
1505
+ "rstrip": false,
1506
+ "normalized": false,
1507
+ "special": true
1508
+ },
1509
+ {
1510
+ "id": 167,
1511
+ "content": "<SENTINEL_ID_110>",
1512
+ "single_word": false,
1513
+ "lstrip": false,
1514
+ "rstrip": false,
1515
+ "normalized": false,
1516
+ "special": true
1517
+ },
1518
+ {
1519
+ "id": 168,
1520
+ "content": "<SENTINEL_ID_111>",
1521
+ "single_word": false,
1522
+ "lstrip": false,
1523
+ "rstrip": false,
1524
+ "normalized": false,
1525
+ "special": true
1526
+ },
1527
+ {
1528
+ "id": 169,
1529
+ "content": "<SENTINEL_ID_112>",
1530
+ "single_word": false,
1531
+ "lstrip": false,
1532
+ "rstrip": false,
1533
+ "normalized": false,
1534
+ "special": true
1535
+ },
1536
+ {
1537
+ "id": 170,
1538
+ "content": "<SENTINEL_ID_113>",
1539
+ "single_word": false,
1540
+ "lstrip": false,
1541
+ "rstrip": false,
1542
+ "normalized": false,
1543
+ "special": true
1544
+ },
1545
+ {
1546
+ "id": 171,
1547
+ "content": "<SENTINEL_ID_114>",
1548
+ "single_word": false,
1549
+ "lstrip": false,
1550
+ "rstrip": false,
1551
+ "normalized": false,
1552
+ "special": true
1553
+ },
1554
+ {
1555
+ "id": 172,
1556
+ "content": "<SENTINEL_ID_115>",
1557
+ "single_word": false,
1558
+ "lstrip": false,
1559
+ "rstrip": false,
1560
+ "normalized": false,
1561
+ "special": true
1562
+ },
1563
+ {
1564
+ "id": 173,
1565
+ "content": "<SENTINEL_ID_116>",
1566
+ "single_word": false,
1567
+ "lstrip": false,
1568
+ "rstrip": false,
1569
+ "normalized": false,
1570
+ "special": true
1571
+ },
1572
+ {
1573
+ "id": 174,
1574
+ "content": "<SENTINEL_ID_117>",
1575
+ "single_word": false,
1576
+ "lstrip": false,
1577
+ "rstrip": false,
1578
+ "normalized": false,
1579
+ "special": true
1580
+ },
1581
+ {
1582
+ "id": 175,
1583
+ "content": "<SENTINEL_ID_118>",
1584
+ "single_word": false,
1585
+ "lstrip": false,
1586
+ "rstrip": false,
1587
+ "normalized": false,
1588
+ "special": true
1589
+ },
1590
+ {
1591
+ "id": 176,
1592
+ "content": "<SENTINEL_ID_119>",
1593
+ "single_word": false,
1594
+ "lstrip": false,
1595
+ "rstrip": false,
1596
+ "normalized": false,
1597
+ "special": true
1598
+ },
1599
+ {
1600
+ "id": 177,
1601
+ "content": "<SENTINEL_ID_120>",
1602
+ "single_word": false,
1603
+ "lstrip": false,
1604
+ "rstrip": false,
1605
+ "normalized": false,
1606
+ "special": true
1607
+ },
1608
+ {
1609
+ "id": 178,
1610
+ "content": "<SENTINEL_ID_121>",
1611
+ "single_word": false,
1612
+ "lstrip": false,
1613
+ "rstrip": false,
1614
+ "normalized": false,
1615
+ "special": true
1616
+ },
1617
+ {
1618
+ "id": 179,
1619
+ "content": "<SENTINEL_ID_122>",
1620
+ "single_word": false,
1621
+ "lstrip": false,
1622
+ "rstrip": false,
1623
+ "normalized": false,
1624
+ "special": true
1625
+ },
1626
+ {
1627
+ "id": 180,
1628
+ "content": "<SENTINEL_ID_123>",
1629
+ "single_word": false,
1630
+ "lstrip": false,
1631
+ "rstrip": false,
1632
+ "normalized": false,
1633
+ "special": true
1634
+ },
1635
+ {
1636
+ "id": 181,
1637
+ "content": "<SENTINEL_ID_124>",
1638
+ "single_word": false,
1639
+ "lstrip": false,
1640
+ "rstrip": false,
1641
+ "normalized": false,
1642
+ "special": true
1643
+ },
1644
+ {
1645
+ "id": 182,
1646
+ "content": "<SENTINEL_ID_125>",
1647
+ "single_word": false,
1648
+ "lstrip": false,
1649
+ "rstrip": false,
1650
+ "normalized": false,
1651
+ "special": true
1652
+ },
1653
+ {
1654
+ "id": 183,
1655
+ "content": "<SENTINEL_ID_126>",
1656
+ "single_word": false,
1657
+ "lstrip": false,
1658
+ "rstrip": false,
1659
+ "normalized": false,
1660
+ "special": true
1661
+ },
1662
+ {
1663
+ "id": 184,
1664
+ "content": "<SENTINEL_ID_127>",
1665
+ "single_word": false,
1666
+ "lstrip": false,
1667
+ "rstrip": false,
1668
+ "normalized": false,
1669
+ "special": true
1670
+ },
1671
+ {
1672
+ "id": 185,
1673
+ "content": "<SENTINEL_ID_128>",
1674
+ "single_word": false,
1675
+ "lstrip": false,
1676
+ "rstrip": false,
1677
+ "normalized": false,
1678
+ "special": true
1679
+ },
1680
+ {
1681
+ "id": 186,
1682
+ "content": "<SENTINEL_ID_129>",
1683
+ "single_word": false,
1684
+ "lstrip": false,
1685
+ "rstrip": false,
1686
+ "normalized": false,
1687
+ "special": true
1688
+ },
1689
+ {
1690
+ "id": 187,
1691
+ "content": "<SENTINEL_ID_130>",
1692
+ "single_word": false,
1693
+ "lstrip": false,
1694
+ "rstrip": false,
1695
+ "normalized": false,
1696
+ "special": true
1697
+ },
1698
+ {
1699
+ "id": 188,
1700
+ "content": "<SENTINEL_ID_131>",
1701
+ "single_word": false,
1702
+ "lstrip": false,
1703
+ "rstrip": false,
1704
+ "normalized": false,
1705
+ "special": true
1706
+ },
1707
+ {
1708
+ "id": 189,
1709
+ "content": "<SENTINEL_ID_132>",
1710
+ "single_word": false,
1711
+ "lstrip": false,
1712
+ "rstrip": false,
1713
+ "normalized": false,
1714
+ "special": true
1715
+ },
1716
+ {
1717
+ "id": 190,
1718
+ "content": "<SENTINEL_ID_133>",
1719
+ "single_word": false,
1720
+ "lstrip": false,
1721
+ "rstrip": false,
1722
+ "normalized": false,
1723
+ "special": true
1724
+ },
1725
+ {
1726
+ "id": 191,
1727
+ "content": "<SENTINEL_ID_134>",
1728
+ "single_word": false,
1729
+ "lstrip": false,
1730
+ "rstrip": false,
1731
+ "normalized": false,
1732
+ "special": true
1733
+ },
1734
+ {
1735
+ "id": 192,
1736
+ "content": "<SENTINEL_ID_135>",
1737
+ "single_word": false,
1738
+ "lstrip": false,
1739
+ "rstrip": false,
1740
+ "normalized": false,
1741
+ "special": true
1742
+ },
1743
+ {
1744
+ "id": 193,
1745
+ "content": "<SENTINEL_ID_136>",
1746
+ "single_word": false,
1747
+ "lstrip": false,
1748
+ "rstrip": false,
1749
+ "normalized": false,
1750
+ "special": true
1751
+ },
1752
+ {
1753
+ "id": 194,
1754
+ "content": "<SENTINEL_ID_137>",
1755
+ "single_word": false,
1756
+ "lstrip": false,
1757
+ "rstrip": false,
1758
+ "normalized": false,
1759
+ "special": true
1760
+ },
1761
+ {
1762
+ "id": 195,
1763
+ "content": "<SENTINEL_ID_138>",
1764
+ "single_word": false,
1765
+ "lstrip": false,
1766
+ "rstrip": false,
1767
+ "normalized": false,
1768
+ "special": true
1769
+ },
1770
+ {
1771
+ "id": 196,
1772
+ "content": "<SENTINEL_ID_139>",
1773
+ "single_word": false,
1774
+ "lstrip": false,
1775
+ "rstrip": false,
1776
+ "normalized": false,
1777
+ "special": true
1778
+ },
1779
+ {
1780
+ "id": 197,
1781
+ "content": "<SENTINEL_ID_140>",
1782
+ "single_word": false,
1783
+ "lstrip": false,
1784
+ "rstrip": false,
1785
+ "normalized": false,
1786
+ "special": true
1787
+ },
1788
+ {
1789
+ "id": 198,
1790
+ "content": "<SENTINEL_ID_141>",
1791
+ "single_word": false,
1792
+ "lstrip": false,
1793
+ "rstrip": false,
1794
+ "normalized": false,
1795
+ "special": true
1796
+ },
1797
+ {
1798
+ "id": 199,
1799
+ "content": "<SENTINEL_ID_142>",
1800
+ "single_word": false,
1801
+ "lstrip": false,
1802
+ "rstrip": false,
1803
+ "normalized": false,
1804
+ "special": true
1805
+ },
1806
+ {
1807
+ "id": 200,
1808
+ "content": "<SENTINEL_ID_143>",
1809
+ "single_word": false,
1810
+ "lstrip": false,
1811
+ "rstrip": false,
1812
+ "normalized": false,
1813
+ "special": true
1814
+ },
1815
+ {
1816
+ "id": 201,
1817
+ "content": "<SENTINEL_ID_144>",
1818
+ "single_word": false,
1819
+ "lstrip": false,
1820
+ "rstrip": false,
1821
+ "normalized": false,
1822
+ "special": true
1823
+ },
1824
+ {
1825
+ "id": 202,
1826
+ "content": "<SENTINEL_ID_145>",
1827
+ "single_word": false,
1828
+ "lstrip": false,
1829
+ "rstrip": false,
1830
+ "normalized": false,
1831
+ "special": true
1832
+ },
1833
+ {
1834
+ "id": 203,
1835
+ "content": "<SENTINEL_ID_146>",
1836
+ "single_word": false,
1837
+ "lstrip": false,
1838
+ "rstrip": false,
1839
+ "normalized": false,
1840
+ "special": true
1841
+ },
1842
+ {
1843
+ "id": 204,
1844
+ "content": "<SENTINEL_ID_147>",
1845
+ "single_word": false,
1846
+ "lstrip": false,
1847
+ "rstrip": false,
1848
+ "normalized": false,
1849
+ "special": true
1850
+ },
1851
+ {
1852
+ "id": 205,
1853
+ "content": "<SENTINEL_ID_148>",
1854
+ "single_word": false,
1855
+ "lstrip": false,
1856
+ "rstrip": false,
1857
+ "normalized": false,
1858
+ "special": true
1859
+ },
1860
+ {
1861
+ "id": 206,
1862
+ "content": "<SENTINEL_ID_149>",
1863
+ "single_word": false,
1864
+ "lstrip": false,
1865
+ "rstrip": false,
1866
+ "normalized": false,
1867
+ "special": true
1868
+ },
1869
+ {
1870
+ "id": 207,
1871
+ "content": "<SENTINEL_ID_150>",
1872
+ "single_word": false,
1873
+ "lstrip": false,
1874
+ "rstrip": false,
1875
+ "normalized": false,
1876
+ "special": true
1877
+ },
1878
+ {
1879
+ "id": 208,
1880
+ "content": "<SENTINEL_ID_151>",
1881
+ "single_word": false,
1882
+ "lstrip": false,
1883
+ "rstrip": false,
1884
+ "normalized": false,
1885
+ "special": true
1886
+ },
1887
+ {
1888
+ "id": 209,
1889
+ "content": "<SENTINEL_ID_152>",
1890
+ "single_word": false,
1891
+ "lstrip": false,
1892
+ "rstrip": false,
1893
+ "normalized": false,
1894
+ "special": true
1895
+ },
1896
+ {
1897
+ "id": 210,
1898
+ "content": "<SENTINEL_ID_153>",
1899
+ "single_word": false,
1900
+ "lstrip": false,
1901
+ "rstrip": false,
1902
+ "normalized": false,
1903
+ "special": true
1904
+ },
1905
+ {
1906
+ "id": 211,
1907
+ "content": "<SENTINEL_ID_154>",
1908
+ "single_word": false,
1909
+ "lstrip": false,
1910
+ "rstrip": false,
1911
+ "normalized": false,
1912
+ "special": true
1913
+ },
1914
+ {
1915
+ "id": 212,
1916
+ "content": "<SENTINEL_ID_155>",
1917
+ "single_word": false,
1918
+ "lstrip": false,
1919
+ "rstrip": false,
1920
+ "normalized": false,
1921
+ "special": true
1922
+ },
1923
+ {
1924
+ "id": 213,
1925
+ "content": "<SENTINEL_ID_156>",
1926
+ "single_word": false,
1927
+ "lstrip": false,
1928
+ "rstrip": false,
1929
+ "normalized": false,
1930
+ "special": true
1931
+ },
1932
+ {
1933
+ "id": 214,
1934
+ "content": "<SENTINEL_ID_157>",
1935
+ "single_word": false,
1936
+ "lstrip": false,
1937
+ "rstrip": false,
1938
+ "normalized": false,
1939
+ "special": true
1940
+ },
1941
+ {
1942
+ "id": 215,
1943
+ "content": "<SENTINEL_ID_158>",
1944
+ "single_word": false,
1945
+ "lstrip": false,
1946
+ "rstrip": false,
1947
+ "normalized": false,
1948
+ "special": true
1949
+ },
1950
+ {
1951
+ "id": 216,
1952
+ "content": "<SENTINEL_ID_159>",
1953
+ "single_word": false,
1954
+ "lstrip": false,
1955
+ "rstrip": false,
1956
+ "normalized": false,
1957
+ "special": true
1958
+ },
1959
+ {
1960
+ "id": 217,
1961
+ "content": "<SENTINEL_ID_160>",
1962
+ "single_word": false,
1963
+ "lstrip": false,
1964
+ "rstrip": false,
1965
+ "normalized": false,
1966
+ "special": true
1967
+ },
1968
+ {
1969
+ "id": 218,
1970
+ "content": "<SENTINEL_ID_161>",
1971
+ "single_word": false,
1972
+ "lstrip": false,
1973
+ "rstrip": false,
1974
+ "normalized": false,
1975
+ "special": true
1976
+ },
1977
+ {
1978
+ "id": 219,
1979
+ "content": "<SENTINEL_ID_162>",
1980
+ "single_word": false,
1981
+ "lstrip": false,
1982
+ "rstrip": false,
1983
+ "normalized": false,
1984
+ "special": true
1985
+ },
1986
+ {
1987
+ "id": 220,
1988
+ "content": "<SENTINEL_ID_163>",
1989
+ "single_word": false,
1990
+ "lstrip": false,
1991
+ "rstrip": false,
1992
+ "normalized": false,
1993
+ "special": true
1994
+ },
1995
+ {
1996
+ "id": 221,
1997
+ "content": "<SENTINEL_ID_164>",
1998
+ "single_word": false,
1999
+ "lstrip": false,
2000
+ "rstrip": false,
2001
+ "normalized": false,
2002
+ "special": true
2003
+ },
2004
+ {
2005
+ "id": 222,
2006
+ "content": "<SENTINEL_ID_165>",
2007
+ "single_word": false,
2008
+ "lstrip": false,
2009
+ "rstrip": false,
2010
+ "normalized": false,
2011
+ "special": true
2012
+ },
2013
+ {
2014
+ "id": 223,
2015
+ "content": "<SENTINEL_ID_166>",
2016
+ "single_word": false,
2017
+ "lstrip": false,
2018
+ "rstrip": false,
2019
+ "normalized": false,
2020
+ "special": true
2021
+ },
2022
+ {
2023
+ "id": 224,
2024
+ "content": "<SENTINEL_ID_167>",
2025
+ "single_word": false,
2026
+ "lstrip": false,
2027
+ "rstrip": false,
2028
+ "normalized": false,
2029
+ "special": true
2030
+ },
2031
+ {
2032
+ "id": 225,
2033
+ "content": "<SENTINEL_ID_168>",
2034
+ "single_word": false,
2035
+ "lstrip": false,
2036
+ "rstrip": false,
2037
+ "normalized": false,
2038
+ "special": true
2039
+ },
2040
+ {
2041
+ "id": 226,
2042
+ "content": "<SENTINEL_ID_169>",
2043
+ "single_word": false,
2044
+ "lstrip": false,
2045
+ "rstrip": false,
2046
+ "normalized": false,
2047
+ "special": true
2048
+ },
2049
+ {
2050
+ "id": 227,
2051
+ "content": "<SENTINEL_ID_170>",
2052
+ "single_word": false,
2053
+ "lstrip": false,
2054
+ "rstrip": false,
2055
+ "normalized": false,
2056
+ "special": true
2057
+ },
2058
+ {
2059
+ "id": 228,
2060
+ "content": "<SENTINEL_ID_171>",
2061
+ "single_word": false,
2062
+ "lstrip": false,
2063
+ "rstrip": false,
2064
+ "normalized": false,
2065
+ "special": true
2066
+ },
2067
+ {
2068
+ "id": 229,
2069
+ "content": "<SENTINEL_ID_172>",
2070
+ "single_word": false,
2071
+ "lstrip": false,
2072
+ "rstrip": false,
2073
+ "normalized": false,
2074
+ "special": true
2075
+ },
2076
+ {
2077
+ "id": 230,
2078
+ "content": "<SENTINEL_ID_173>",
2079
+ "single_word": false,
2080
+ "lstrip": false,
2081
+ "rstrip": false,
2082
+ "normalized": false,
2083
+ "special": true
2084
+ },
2085
+ {
2086
+ "id": 231,
2087
+ "content": "<SENTINEL_ID_174>",
2088
+ "single_word": false,
2089
+ "lstrip": false,
2090
+ "rstrip": false,
2091
+ "normalized": false,
2092
+ "special": true
2093
+ },
2094
+ {
2095
+ "id": 232,
2096
+ "content": "<SENTINEL_ID_175>",
2097
+ "single_word": false,
2098
+ "lstrip": false,
2099
+ "rstrip": false,
2100
+ "normalized": false,
2101
+ "special": true
2102
+ },
2103
+ {
2104
+ "id": 233,
2105
+ "content": "<SENTINEL_ID_176>",
2106
+ "single_word": false,
2107
+ "lstrip": false,
2108
+ "rstrip": false,
2109
+ "normalized": false,
2110
+ "special": true
2111
+ },
2112
+ {
2113
+ "id": 234,
2114
+ "content": "<SENTINEL_ID_177>",
2115
+ "single_word": false,
2116
+ "lstrip": false,
2117
+ "rstrip": false,
2118
+ "normalized": false,
2119
+ "special": true
2120
+ },
2121
+ {
2122
+ "id": 235,
2123
+ "content": "<SENTINEL_ID_178>",
2124
+ "single_word": false,
2125
+ "lstrip": false,
2126
+ "rstrip": false,
2127
+ "normalized": false,
2128
+ "special": true
2129
+ },
2130
+ {
2131
+ "id": 236,
2132
+ "content": "<SENTINEL_ID_179>",
2133
+ "single_word": false,
2134
+ "lstrip": false,
2135
+ "rstrip": false,
2136
+ "normalized": false,
2137
+ "special": true
2138
+ },
2139
+ {
2140
+ "id": 237,
2141
+ "content": "<SENTINEL_ID_180>",
2142
+ "single_word": false,
2143
+ "lstrip": false,
2144
+ "rstrip": false,
2145
+ "normalized": false,
2146
+ "special": true
2147
+ },
2148
+ {
2149
+ "id": 238,
2150
+ "content": "<SENTINEL_ID_181>",
2151
+ "single_word": false,
2152
+ "lstrip": false,
2153
+ "rstrip": false,
2154
+ "normalized": false,
2155
+ "special": true
2156
+ },
2157
+ {
2158
+ "id": 239,
2159
+ "content": "<SENTINEL_ID_182>",
2160
+ "single_word": false,
2161
+ "lstrip": false,
2162
+ "rstrip": false,
2163
+ "normalized": false,
2164
+ "special": true
2165
+ },
2166
+ {
2167
+ "id": 240,
2168
+ "content": "<SENTINEL_ID_183>",
2169
+ "single_word": false,
2170
+ "lstrip": false,
2171
+ "rstrip": false,
2172
+ "normalized": false,
2173
+ "special": true
2174
+ },
2175
+ {
2176
+ "id": 241,
2177
+ "content": "<SENTINEL_ID_184>",
2178
+ "single_word": false,
2179
+ "lstrip": false,
2180
+ "rstrip": false,
2181
+ "normalized": false,
2182
+ "special": true
2183
+ },
2184
+ {
2185
+ "id": 242,
2186
+ "content": "<SENTINEL_ID_185>",
2187
+ "single_word": false,
2188
+ "lstrip": false,
2189
+ "rstrip": false,
2190
+ "normalized": false,
2191
+ "special": true
2192
+ },
2193
+ {
2194
+ "id": 243,
2195
+ "content": "<SENTINEL_ID_186>",
2196
+ "single_word": false,
2197
+ "lstrip": false,
2198
+ "rstrip": false,
2199
+ "normalized": false,
2200
+ "special": true
2201
+ },
2202
+ {
2203
+ "id": 244,
2204
+ "content": "<SENTINEL_ID_187>",
2205
+ "single_word": false,
2206
+ "lstrip": false,
2207
+ "rstrip": false,
2208
+ "normalized": false,
2209
+ "special": true
2210
+ },
2211
+ {
2212
+ "id": 245,
2213
+ "content": "<SENTINEL_ID_188>",
2214
+ "single_word": false,
2215
+ "lstrip": false,
2216
+ "rstrip": false,
2217
+ "normalized": false,
2218
+ "special": true
2219
+ },
2220
+ {
2221
+ "id": 246,
2222
+ "content": "<SENTINEL_ID_189>",
2223
+ "single_word": false,
2224
+ "lstrip": false,
2225
+ "rstrip": false,
2226
+ "normalized": false,
2227
+ "special": true
2228
+ },
2229
+ {
2230
+ "id": 247,
2231
+ "content": "<SENTINEL_ID_190>",
2232
+ "single_word": false,
2233
+ "lstrip": false,
2234
+ "rstrip": false,
2235
+ "normalized": false,
2236
+ "special": true
2237
+ },
2238
+ {
2239
+ "id": 248,
2240
+ "content": "<SENTINEL_ID_191>",
2241
+ "single_word": false,
2242
+ "lstrip": false,
2243
+ "rstrip": false,
2244
+ "normalized": false,
2245
+ "special": true
2246
+ },
2247
+ {
2248
+ "id": 249,
2249
+ "content": "<SENTINEL_ID_192>",
2250
+ "single_word": false,
2251
+ "lstrip": false,
2252
+ "rstrip": false,
2253
+ "normalized": false,
2254
+ "special": true
2255
+ },
2256
+ {
2257
+ "id": 250,
2258
+ "content": "<SENTINEL_ID_193>",
2259
+ "single_word": false,
2260
+ "lstrip": false,
2261
+ "rstrip": false,
2262
+ "normalized": false,
2263
+ "special": true
2264
+ },
2265
+ {
2266
+ "id": 251,
2267
+ "content": "<SENTINEL_ID_194>",
2268
+ "single_word": false,
2269
+ "lstrip": false,
2270
+ "rstrip": false,
2271
+ "normalized": false,
2272
+ "special": true
2273
+ },
2274
+ {
2275
+ "id": 252,
2276
+ "content": "<SENTINEL_ID_195>",
2277
+ "single_word": false,
2278
+ "lstrip": false,
2279
+ "rstrip": false,
2280
+ "normalized": false,
2281
+ "special": true
2282
+ },
2283
+ {
2284
+ "id": 253,
2285
+ "content": "<SENTINEL_ID_196>",
2286
+ "single_word": false,
2287
+ "lstrip": false,
2288
+ "rstrip": false,
2289
+ "normalized": false,
2290
+ "special": true
2291
+ },
2292
+ {
2293
+ "id": 254,
2294
+ "content": "<SENTINEL_ID_197>",
2295
+ "single_word": false,
2296
+ "lstrip": false,
2297
+ "rstrip": false,
2298
+ "normalized": false,
2299
+ "special": true
2300
+ },
2301
+ {
2302
+ "id": 255,
2303
+ "content": "<SENTINEL_ID_198>",
2304
+ "single_word": false,
2305
+ "lstrip": false,
2306
+ "rstrip": false,
2307
+ "normalized": false,
2308
+ "special": true
2309
+ },
2310
+ {
2311
+ "id": 256,
2312
+ "content": "<SENTINEL_ID_199>",
2313
+ "single_word": false,
2314
+ "lstrip": false,
2315
+ "rstrip": false,
2316
+ "normalized": false,
2317
+ "special": true
2318
+ },
2319
+ {
2320
+ "id": 257,
2321
+ "content": "<MOLECULAR_ENTITY_TYPE_ANTIGEN>",
2322
+ "single_word": false,
2323
+ "lstrip": false,
2324
+ "rstrip": false,
2325
+ "normalized": false,
2326
+ "special": true
2327
+ },
2328
+ {
2329
+ "id": 258,
2330
+ "content": "<MOLECULAR_ENTITY_TYPE_ANTIBODY_LIGHT_CHAIN>",
2331
+ "single_word": false,
2332
+ "lstrip": false,
2333
+ "rstrip": false,
2334
+ "normalized": false,
2335
+ "special": true
2336
+ },
2337
+ {
2338
+ "id": 259,
2339
+ "content": "<MOLECULAR_ENTITY_TYPE_ANTIBODY_HEAVY_CHAIN>",
2340
+ "single_word": false,
2341
+ "lstrip": false,
2342
+ "rstrip": false,
2343
+ "normalized": false,
2344
+ "special": true
2345
+ },
2346
+ {
2347
+ "id": 260,
2348
+ "content": "<ATTRIBUTE_ORGANISM>",
2349
+ "single_word": false,
2350
+ "lstrip": false,
2351
+ "rstrip": false,
2352
+ "normalized": false,
2353
+ "special": true
2354
+ },
2355
+ {
2356
+ "id": 261,
2357
+ "content": "<ATTRIBUTE_ORGANISM_HUMAN>",
2358
+ "single_word": false,
2359
+ "lstrip": false,
2360
+ "rstrip": false,
2361
+ "normalized": false,
2362
+ "special": true
2363
+ },
2364
+ {
2365
+ "id": 262,
2366
+ "content": "<ATTRIBUTE_ORGANISM_RABBIT>",
2367
+ "single_word": false,
2368
+ "lstrip": false,
2369
+ "rstrip": false,
2370
+ "normalized": false,
2371
+ "special": true
2372
+ },
2373
+ {
2374
+ "id": 263,
2375
+ "content": "<ATTRIBUTE_ORGANISM_RAT>",
2376
+ "single_word": false,
2377
+ "lstrip": false,
2378
+ "rstrip": false,
2379
+ "normalized": false,
2380
+ "special": true
2381
+ },
2382
+ {
2383
+ "id": 264,
2384
+ "content": "<ATTRIBUTE_ORGANISM_MOUSE>",
2385
+ "single_word": false,
2386
+ "lstrip": false,
2387
+ "rstrip": false,
2388
+ "normalized": false,
2389
+ "special": true
2390
+ },
2391
+ {
2392
+ "id": 265,
2393
+ "content": "<ATTRIBUTE_ORGANISM_MONKEY>",
2394
+ "single_word": false,
2395
+ "lstrip": false,
2396
+ "rstrip": false,
2397
+ "normalized": false,
2398
+ "special": true
2399
+ },
2400
+ {
2401
+ "id": 266,
2402
+ "content": "<ATTRIBUTE_ORGANISM_CAMEL>",
2403
+ "single_word": false,
2404
+ "lstrip": false,
2405
+ "rstrip": false,
2406
+ "normalized": false,
2407
+ "special": true
2408
+ },
2409
+ {
2410
+ "id": 267,
2411
+ "content": "<EPITOPE_PARATOPE_PREDICTION>",
2412
+ "single_word": false,
2413
+ "lstrip": false,
2414
+ "rstrip": false,
2415
+ "normalized": false,
2416
+ "special": true
2417
+ },
2418
+ {
2419
+ "id": 268,
2420
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR1>",
2421
+ "single_word": false,
2422
+ "lstrip": false,
2423
+ "rstrip": false,
2424
+ "normalized": false,
2425
+ "special": true
2426
+ },
2427
+ {
2428
+ "id": 269,
2429
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR3>",
2430
+ "single_word": false,
2431
+ "lstrip": false,
2432
+ "rstrip": false,
2433
+ "normalized": false,
2434
+ "special": true
2435
+ },
2436
+ {
2437
+ "id": 270,
2438
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR3>",
2439
+ "single_word": false,
2440
+ "lstrip": false,
2441
+ "rstrip": false,
2442
+ "normalized": false,
2443
+ "special": true
2444
+ },
2445
+ {
2446
+ "id": 271,
2447
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR2>",
2448
+ "single_word": false,
2449
+ "lstrip": false,
2450
+ "rstrip": false,
2451
+ "normalized": false,
2452
+ "special": true
2453
+ },
2454
+ {
2455
+ "id": 272,
2456
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR2>",
2457
+ "single_word": false,
2458
+ "lstrip": false,
2459
+ "rstrip": false,
2460
+ "normalized": false,
2461
+ "special": true
2462
+ },
2463
+ {
2464
+ "id": 273,
2465
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR1>",
2466
+ "single_word": false,
2467
+ "lstrip": false,
2468
+ "rstrip": false,
2469
+ "normalized": false,
2470
+ "special": true
2471
+ },
2472
+ {
2473
+ "id": 274,
2474
+ "content": "<MOLECULAR_ENTITY_GENERAL_PROTEIN>",
2475
+ "single_word": false,
2476
+ "lstrip": false,
2477
+ "rstrip": false,
2478
+ "normalized": false,
2479
+ "special": true
2480
+ },
2481
+ {
2482
+ "id": 275,
2483
+ "content": "<TIMESTEP>",
2484
+ "single_word": false,
2485
+ "lstrip": false,
2486
+ "rstrip": false,
2487
+ "normalized": false,
2488
+ "special": true
2489
+ },
2490
+ {
2491
+ "id": 276,
2492
+ "content": "<DIFFUSION>",
2493
+ "single_word": false,
2494
+ "lstrip": false,
2495
+ "rstrip": false,
2496
+ "normalized": false,
2497
+ "special": true
2498
+ },
2499
+ {
2500
+ "id": 277,
2501
+ "content": "<SEQUENCE_NATURAL_END>",
2502
+ "single_word": false,
2503
+ "lstrip": false,
2504
+ "rstrip": false,
2505
+ "normalized": false,
2506
+ "special": true
2507
+ },
2508
+ {
2509
+ "id": 278,
2510
+ "content": "<SMILES_SEQUENCE>",
2511
+ "single_word": false,
2512
+ "lstrip": false,
2513
+ "rstrip": false,
2514
+ "normalized": false,
2515
+ "special": true
2516
+ },
2517
+ {
2518
+ "id": 279,
2519
+ "content": "<SELFIES_SEQUENCE>",
2520
+ "single_word": false,
2521
+ "lstrip": false,
2522
+ "rstrip": false,
2523
+ "normalized": false,
2524
+ "special": true
2525
+ },
2526
+ {
2527
+ "id": 280,
2528
+ "content": "<AMINO_ACID_SEQUENCE>",
2529
+ "single_word": false,
2530
+ "lstrip": false,
2531
+ "rstrip": false,
2532
+ "normalized": false,
2533
+ "special": true
2534
+ },
2535
+ {
2536
+ "id": 281,
2537
+ "content": "<GENERAL_AFFINITY_CLASS>",
2538
+ "single_word": false,
2539
+ "lstrip": false,
2540
+ "rstrip": false,
2541
+ "normalized": false,
2542
+ "special": true
2543
+ },
2544
+ {
2545
+ "id": 282,
2546
+ "content": "<BACKSPACE>",
2547
+ "single_word": false,
2548
+ "lstrip": false,
2549
+ "rstrip": false,
2550
+ "normalized": false,
2551
+ "special": true
2552
+ },
2553
+ {
2554
+ "id": 283,
2555
+ "content": "<SEQUENCE_NATURAL_START>",
2556
+ "single_word": false,
2557
+ "lstrip": false,
2558
+ "rstrip": false,
2559
+ "normalized": false,
2560
+ "special": true
2561
+ },
2562
+ {
2563
+ "id": 284,
2564
+ "content": "<NOOP>",
2565
+ "single_word": false,
2566
+ "lstrip": false,
2567
+ "rstrip": false,
2568
+ "normalized": false,
2569
+ "special": true
2570
+ },
2571
+ {
2572
+ "id": 285,
2573
+ "content": "<TARGETED_ANTIBODY_DESIGN_ENCODER_ONLY_MODE>",
2574
+ "single_word": false,
2575
+ "lstrip": false,
2576
+ "rstrip": false,
2577
+ "normalized": false,
2578
+ "special": true
2579
+ },
2580
+ {
2581
+ "id": 286,
2582
+ "content": "<MOLECULAR_ENTITY_SMALL_MOLECULE>",
2583
+ "single_word": false,
2584
+ "lstrip": false,
2585
+ "rstrip": false,
2586
+ "normalized": false,
2587
+ "special": true
2588
+ },
2589
+ {
2590
+ "id": 287,
2591
+ "content": "<MOLECULAR_ENTITY_CELL_GENE_EXPRESSION_RANKED>",
2592
+ "single_word": false,
2593
+ "lstrip": false,
2594
+ "rstrip": false,
2595
+ "normalized": false,
2596
+ "special": true
2597
+ },
2598
+ {
2599
+ "id": 288,
2600
+ "content": "<CELL_TYPE_CLASS>",
2601
+ "single_word": false,
2602
+ "lstrip": false,
2603
+ "rstrip": false,
2604
+ "normalized": false,
2605
+ "special": true
2606
+ },
2607
+ {
2608
+ "id": 289,
2609
+ "content": "<TISSUE_TYPE_CLASS>",
2610
+ "single_word": false,
2611
+ "lstrip": false,
2612
+ "rstrip": false,
2613
+ "normalized": false,
2614
+ "special": true
2615
+ },
2616
+ {
2617
+ "id": 290,
2618
+ "content": "<CORRUPTED_AREA_START>",
2619
+ "single_word": false,
2620
+ "lstrip": false,
2621
+ "rstrip": false,
2622
+ "normalized": false,
2623
+ "special": true
2624
+ },
2625
+ {
2626
+ "id": 291,
2627
+ "content": "<CORRUPTED_AREA_END>",
2628
+ "single_word": false,
2629
+ "lstrip": false,
2630
+ "rstrip": false,
2631
+ "normalized": false,
2632
+ "special": true
2633
+ },
2634
+ {
2635
+ "id": 292,
2636
+ "content": "<MOLECULAR_ENTITY_MUTATED_PROTEIN_CHAIN>",
2637
+ "single_word": false,
2638
+ "lstrip": false,
2639
+ "rstrip": false,
2640
+ "normalized": false,
2641
+ "special": true
2642
+ },
2643
+ {
2644
+ "id": 293,
2645
+ "content": "<MOLECULAR_ENTITY_PROTEIN_CHAIN>",
2646
+ "single_word": false,
2647
+ "lstrip": false,
2648
+ "rstrip": false,
2649
+ "normalized": false,
2650
+ "special": true
2651
+ },
2652
+ {
2653
+ "id": 294,
2654
+ "content": "<COMPLEX_ENTITY>",
2655
+ "single_word": false,
2656
+ "lstrip": false,
2657
+ "rstrip": false,
2658
+ "normalized": false,
2659
+ "special": true
2660
+ },
2661
+ {
2662
+ "id": 295,
2663
+ "content": "<ALTERNATIVE>",
2664
+ "single_word": false,
2665
+ "lstrip": false,
2666
+ "rstrip": false,
2667
+ "normalized": false,
2668
+ "special": true
2669
+ },
2670
+ {
2671
+ "id": 296,
2672
+ "content": "<CDR3_REGION>",
2673
+ "single_word": false,
2674
+ "lstrip": false,
2675
+ "rstrip": false,
2676
+ "normalized": false,
2677
+ "special": true
2678
+ },
2679
+ {
2680
+ "id": 297,
2681
+ "content": "<GENERAL_CHAIN>",
2682
+ "single_word": false,
2683
+ "lstrip": false,
2684
+ "rstrip": false,
2685
+ "normalized": false,
2686
+ "special": true
2687
+ },
2688
+ {
2689
+ "id": 298,
2690
+ "content": "<SUBMOLECULAR_ENTITY>",
2691
+ "single_word": false,
2692
+ "lstrip": false,
2693
+ "rstrip": false,
2694
+ "normalized": false,
2695
+ "special": true
2696
+ },
2697
+ {
2698
+ "id": 299,
2699
+ "content": "<MUTATED>",
2700
+ "single_word": false,
2701
+ "lstrip": false,
2702
+ "rstrip": false,
2703
+ "normalized": false,
2704
+ "special": true
2705
+ },
2706
+ {
2707
+ "id": 300,
2708
+ "content": "<MOLECULAR_ENTITY_TCR_ALPHA_CDR3>",
2709
+ "single_word": false,
2710
+ "lstrip": false,
2711
+ "rstrip": false,
2712
+ "normalized": false,
2713
+ "special": true
2714
+ },
2715
+ {
2716
+ "id": 301,
2717
+ "content": "<MOLECULAR_ENTITY_TCR_DELTA_CDR3>",
2718
+ "single_word": false,
2719
+ "lstrip": false,
2720
+ "rstrip": false,
2721
+ "normalized": false,
2722
+ "special": true
2723
+ },
2724
+ {
2725
+ "id": 302,
2726
+ "content": "<MOLECULAR_ENTITY_TCR_DELTA_VAR>",
2727
+ "single_word": false,
2728
+ "lstrip": false,
2729
+ "rstrip": false,
2730
+ "normalized": false,
2731
+ "special": true
2732
+ },
2733
+ {
2734
+ "id": 303,
2735
+ "content": "<MOLECULAR_ENTITY_TCR_GAMMA_CDR3>",
2736
+ "single_word": false,
2737
+ "lstrip": false,
2738
+ "rstrip": false,
2739
+ "normalized": false,
2740
+ "special": true
2741
+ },
2742
+ {
2743
+ "id": 304,
2744
+ "content": "<MOLECULAR_ENTITY_TCR_GAMMA_VAR>",
2745
+ "single_word": false,
2746
+ "lstrip": false,
2747
+ "rstrip": false,
2748
+ "normalized": false,
2749
+ "special": true
2750
+ },
2751
+ {
2752
+ "id": 305,
2753
+ "content": "<SCALAR>",
2754
+ "single_word": false,
2755
+ "lstrip": false,
2756
+ "rstrip": false,
2757
+ "normalized": false,
2758
+ "special": true
2759
+ },
2760
+ {
2761
+ "id": 306,
2762
+ "content": "<VECTOR>",
2763
+ "single_word": false,
2764
+ "lstrip": false,
2765
+ "rstrip": false,
2766
+ "normalized": false,
2767
+ "special": true
2768
+ },
2769
+ {
2770
+ "id": 307,
2771
+ "content": "<MASKED_SCALAR>",
2772
+ "single_word": false,
2773
+ "lstrip": false,
2774
+ "rstrip": false,
2775
+ "normalized": false,
2776
+ "special": true
2777
+ },
2778
+ {
2779
+ "id": 308,
2780
+ "content": "<MASKED_VECTOR>",
2781
+ "single_word": false,
2782
+ "lstrip": false,
2783
+ "rstrip": false,
2784
+ "normalized": false,
2785
+ "special": true
2786
+ },
2787
+ {
2788
+ "id": 309,
2789
+ "content": "<AUTOENCODER_LATENT_LOG_VARIANCE>",
2790
+ "single_word": false,
2791
+ "lstrip": false,
2792
+ "rstrip": false,
2793
+ "normalized": false,
2794
+ "special": true
2795
+ },
2796
+ {
2797
+ "id": 310,
2798
+ "content": "<AUTOENCODER_LATENT_MEAN>",
2799
+ "single_word": false,
2800
+ "lstrip": false,
2801
+ "rstrip": false,
2802
+ "normalized": false,
2803
+ "special": true
2804
+ },
2805
+ {
2806
+ "id": 311,
2807
+ "content": "<AUTOENCODER_LATENT_SAMPLED_Z>",
2808
+ "single_word": false,
2809
+ "lstrip": false,
2810
+ "rstrip": false,
2811
+ "normalized": false,
2812
+ "special": true
2813
+ },
2814
+ {
2815
+ "id": 312,
2816
+ "content": "<AUTOENCODER_TASK>",
2817
+ "single_word": false,
2818
+ "lstrip": false,
2819
+ "rstrip": false,
2820
+ "normalized": false,
2821
+ "special": true
2822
+ },
2823
+ {
2824
+ "id": 313,
2825
+ "content": "<DECODED_FROM_LATENT>",
2826
+ "single_word": false,
2827
+ "lstrip": false,
2828
+ "rstrip": false,
2829
+ "normalized": false,
2830
+ "special": true
2831
+ },
2832
+ {
2833
+ "id": 314,
2834
+ "content": "<BBBP>",
2835
+ "single_word": false,
2836
+ "lstrip": false,
2837
+ "rstrip": false,
2838
+ "normalized": false,
2839
+ "special": true
2840
+ },
2841
+ {
2842
+ "id": 315,
2843
+ "content": "<FDA_APPR>",
2844
+ "single_word": false,
2845
+ "lstrip": false,
2846
+ "rstrip": false,
2847
+ "normalized": false,
2848
+ "special": true
2849
+ },
2850
+ {
2851
+ "id": 316,
2852
+ "content": "<HIV_ACTIVITY>",
2853
+ "single_word": false,
2854
+ "lstrip": false,
2855
+ "rstrip": false,
2856
+ "normalized": false,
2857
+ "special": true
2858
+ }
2859
+ ],
2860
+ "normalizer": null,
2861
+ "pre_tokenizer": {
2862
+ "type": "Sequence",
2863
+ "pretokenizers": [
2864
+ {
2865
+ "type": "Split",
2866
+ "pattern": {
2867
+ "Regex": "<.*?>|\\[.*?\\]|\\S"
2868
+ },
2869
+ "behavior": "Removed",
2870
+ "invert": true
2871
+ }
2872
+ ]
2873
+ },
2874
+ "post_processor": null,
2875
+ "decoder": null,
2876
+ "model": {
2877
+ "type": "WordLevel",
2878
+ "vocab": {
2879
+ "<UNK>": 0,
2880
+ "<PAD>": 1,
2881
+ "<CLS>": 2,
2882
+ "<SEP>": 3,
2883
+ "<MASK>": 4,
2884
+ "<EOS>": 5,
2885
+ "<MOLECULAR_ENTITY>": 6,
2886
+ "<GLOBAL_INTERACTION_ATTRIBUTES>": 7,
2887
+ "<MOLECULAR_ENTITY_ANTIGEN>": 8,
2888
+ "<MOLECULAR_ENTITY_EPITOPE>": 9,
2889
+ "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN>": 10,
2890
+ "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN>": 11,
2891
+ "<MOLECULAR_ENTITY_TCR_ALPHA_CHAIN>": 12,
2892
+ "<MOLECULAR_ENTITY_TCR_BETA_VDJ>": 13,
2893
+ "<MOLECULAR_ENTITY_TCR_BETA_CDR3>": 14,
2894
+ "<BINDING_AFFINITY_CLASS>": 15,
2895
+ "<DECODER_START>": 16,
2896
+ "<BINDING>": 17,
2897
+ "<FILLIN>": 18,
2898
+ "<REORDER>": 19,
2899
+ "<TOAA>": 20,
2900
+ "<ACTIVE>": 21,
2901
+ "<GENESEQ>": 22,
2902
+ "<INCREASE>": 23,
2903
+ "<DECREASE>": 24,
2904
+ "<STRUCTURE>": 25,
2905
+ "<DISTANCE>": 26,
2906
+ "<SOLUBILITY>": 27,
2907
+ "<TOXICITY>": 28,
2908
+ "<AB>": 29,
2909
+ "<ISACTIVE>": 30,
2910
+ "<ISSYNTHETIC>": 31,
2911
+ "<PENETR>": 32,
2912
+ "<ABSORPTION>": 33,
2913
+ "<DISTRIBUTION>": 34,
2914
+ "<METABOLISM>": 35,
2915
+ "<EXCRETION>": 36,
2916
+ "<FLUORESCENCE>": 37,
2917
+ "<STABILITY>": 38,
2918
+ "<DISORDER>": 39,
2919
+ "<DISEASE>": 40,
2920
+ "<BINARY>": 41,
2921
+ "<REGRESSION>": 42,
2922
+ "<ORGANISM>": 43,
2923
+ "<0>": 44,
2924
+ "<1>": 45,
2925
+ "<2>": 46,
2926
+ "<3>": 47,
2927
+ "<4>": 48,
2928
+ "<5>": 49,
2929
+ "<6>": 50,
2930
+ "<7>": 51,
2931
+ "<8>": 52,
2932
+ "<9>": 53,
2933
+ "<.>": 54,
2934
+ "<YES>": 55,
2935
+ "<NO>": 56,
2936
+ "<SENTINEL_ID_0>": 57,
2937
+ "<SENTINEL_ID_1>": 58,
2938
+ "<SENTINEL_ID_2>": 59,
2939
+ "<SENTINEL_ID_3>": 60,
2940
+ "<SENTINEL_ID_4>": 61,
2941
+ "<SENTINEL_ID_5>": 62,
2942
+ "<SENTINEL_ID_6>": 63,
2943
+ "<SENTINEL_ID_7>": 64,
2944
+ "<SENTINEL_ID_8>": 65,
2945
+ "<SENTINEL_ID_9>": 66,
2946
+ "<SENTINEL_ID_10>": 67,
2947
+ "<SENTINEL_ID_11>": 68,
2948
+ "<SENTINEL_ID_12>": 69,
2949
+ "<SENTINEL_ID_13>": 70,
2950
+ "<SENTINEL_ID_14>": 71,
2951
+ "<SENTINEL_ID_15>": 72,
2952
+ "<SENTINEL_ID_16>": 73,
2953
+ "<SENTINEL_ID_17>": 74,
2954
+ "<SENTINEL_ID_18>": 75,
2955
+ "<SENTINEL_ID_19>": 76,
2956
+ "<SENTINEL_ID_20>": 77,
2957
+ "<SENTINEL_ID_21>": 78,
2958
+ "<SENTINEL_ID_22>": 79,
2959
+ "<SENTINEL_ID_23>": 80,
2960
+ "<SENTINEL_ID_24>": 81,
2961
+ "<SENTINEL_ID_25>": 82,
2962
+ "<SENTINEL_ID_26>": 83,
2963
+ "<SENTINEL_ID_27>": 84,
2964
+ "<SENTINEL_ID_28>": 85,
2965
+ "<SENTINEL_ID_29>": 86,
2966
+ "<SENTINEL_ID_30>": 87,
2967
+ "<SENTINEL_ID_31>": 88,
2968
+ "<SENTINEL_ID_32>": 89,
2969
+ "<SENTINEL_ID_33>": 90,
2970
+ "<SENTINEL_ID_34>": 91,
2971
+ "<SENTINEL_ID_35>": 92,
2972
+ "<SENTINEL_ID_36>": 93,
2973
+ "<SENTINEL_ID_37>": 94,
2974
+ "<SENTINEL_ID_38>": 95,
2975
+ "<SENTINEL_ID_39>": 96,
2976
+ "<SENTINEL_ID_40>": 97,
2977
+ "<SENTINEL_ID_41>": 98,
2978
+ "<SENTINEL_ID_42>": 99,
2979
+ "<SENTINEL_ID_43>": 100,
2980
+ "<SENTINEL_ID_44>": 101,
2981
+ "<SENTINEL_ID_45>": 102,
2982
+ "<SENTINEL_ID_46>": 103,
2983
+ "<SENTINEL_ID_47>": 104,
2984
+ "<SENTINEL_ID_48>": 105,
2985
+ "<SENTINEL_ID_49>": 106,
2986
+ "<SENTINEL_ID_50>": 107,
2987
+ "<SENTINEL_ID_51>": 108,
2988
+ "<SENTINEL_ID_52>": 109,
2989
+ "<SENTINEL_ID_53>": 110,
2990
+ "<SENTINEL_ID_54>": 111,
2991
+ "<SENTINEL_ID_55>": 112,
2992
+ "<SENTINEL_ID_56>": 113,
2993
+ "<SENTINEL_ID_57>": 114,
2994
+ "<SENTINEL_ID_58>": 115,
2995
+ "<SENTINEL_ID_59>": 116,
2996
+ "<SENTINEL_ID_60>": 117,
2997
+ "<SENTINEL_ID_61>": 118,
2998
+ "<SENTINEL_ID_62>": 119,
2999
+ "<SENTINEL_ID_63>": 120,
3000
+ "<SENTINEL_ID_64>": 121,
3001
+ "<SENTINEL_ID_65>": 122,
3002
+ "<SENTINEL_ID_66>": 123,
3003
+ "<SENTINEL_ID_67>": 124,
3004
+ "<SENTINEL_ID_68>": 125,
3005
+ "<SENTINEL_ID_69>": 126,
3006
+ "<SENTINEL_ID_70>": 127,
3007
+ "<SENTINEL_ID_71>": 128,
3008
+ "<SENTINEL_ID_72>": 129,
3009
+ "<SENTINEL_ID_73>": 130,
3010
+ "<SENTINEL_ID_74>": 131,
3011
+ "<SENTINEL_ID_75>": 132,
3012
+ "<SENTINEL_ID_76>": 133,
3013
+ "<SENTINEL_ID_77>": 134,
3014
+ "<SENTINEL_ID_78>": 135,
3015
+ "<SENTINEL_ID_79>": 136,
3016
+ "<SENTINEL_ID_80>": 137,
3017
+ "<SENTINEL_ID_81>": 138,
3018
+ "<SENTINEL_ID_82>": 139,
3019
+ "<SENTINEL_ID_83>": 140,
3020
+ "<SENTINEL_ID_84>": 141,
3021
+ "<SENTINEL_ID_85>": 142,
3022
+ "<SENTINEL_ID_86>": 143,
3023
+ "<SENTINEL_ID_87>": 144,
3024
+ "<SENTINEL_ID_88>": 145,
3025
+ "<SENTINEL_ID_89>": 146,
3026
+ "<SENTINEL_ID_90>": 147,
3027
+ "<SENTINEL_ID_91>": 148,
3028
+ "<SENTINEL_ID_92>": 149,
3029
+ "<SENTINEL_ID_93>": 150,
3030
+ "<SENTINEL_ID_94>": 151,
3031
+ "<SENTINEL_ID_95>": 152,
3032
+ "<SENTINEL_ID_96>": 153,
3033
+ "<SENTINEL_ID_97>": 154,
3034
+ "<SENTINEL_ID_98>": 155,
3035
+ "<SENTINEL_ID_99>": 156,
3036
+ "<SENTINEL_ID_100>": 157,
3037
+ "<SENTINEL_ID_101>": 158,
3038
+ "<SENTINEL_ID_102>": 159,
3039
+ "<SENTINEL_ID_103>": 160,
3040
+ "<SENTINEL_ID_104>": 161,
3041
+ "<SENTINEL_ID_105>": 162,
3042
+ "<SENTINEL_ID_106>": 163,
3043
+ "<SENTINEL_ID_107>": 164,
3044
+ "<SENTINEL_ID_108>": 165,
3045
+ "<SENTINEL_ID_109>": 166,
3046
+ "<SENTINEL_ID_110>": 167,
3047
+ "<SENTINEL_ID_111>": 168,
3048
+ "<SENTINEL_ID_112>": 169,
3049
+ "<SENTINEL_ID_113>": 170,
3050
+ "<SENTINEL_ID_114>": 171,
3051
+ "<SENTINEL_ID_115>": 172,
3052
+ "<SENTINEL_ID_116>": 173,
3053
+ "<SENTINEL_ID_117>": 174,
3054
+ "<SENTINEL_ID_118>": 175,
3055
+ "<SENTINEL_ID_119>": 176,
3056
+ "<SENTINEL_ID_120>": 177,
3057
+ "<SENTINEL_ID_121>": 178,
3058
+ "<SENTINEL_ID_122>": 179,
3059
+ "<SENTINEL_ID_123>": 180,
3060
+ "<SENTINEL_ID_124>": 181,
3061
+ "<SENTINEL_ID_125>": 182,
3062
+ "<SENTINEL_ID_126>": 183,
3063
+ "<SENTINEL_ID_127>": 184,
3064
+ "<SENTINEL_ID_128>": 185,
3065
+ "<SENTINEL_ID_129>": 186,
3066
+ "<SENTINEL_ID_130>": 187,
3067
+ "<SENTINEL_ID_131>": 188,
3068
+ "<SENTINEL_ID_132>": 189,
3069
+ "<SENTINEL_ID_133>": 190,
3070
+ "<SENTINEL_ID_134>": 191,
3071
+ "<SENTINEL_ID_135>": 192,
3072
+ "<SENTINEL_ID_136>": 193,
3073
+ "<SENTINEL_ID_137>": 194,
3074
+ "<SENTINEL_ID_138>": 195,
3075
+ "<SENTINEL_ID_139>": 196,
3076
+ "<SENTINEL_ID_140>": 197,
3077
+ "<SENTINEL_ID_141>": 198,
3078
+ "<SENTINEL_ID_142>": 199,
3079
+ "<SENTINEL_ID_143>": 200,
3080
+ "<SENTINEL_ID_144>": 201,
3081
+ "<SENTINEL_ID_145>": 202,
3082
+ "<SENTINEL_ID_146>": 203,
3083
+ "<SENTINEL_ID_147>": 204,
3084
+ "<SENTINEL_ID_148>": 205,
3085
+ "<SENTINEL_ID_149>": 206,
3086
+ "<SENTINEL_ID_150>": 207,
3087
+ "<SENTINEL_ID_151>": 208,
3088
+ "<SENTINEL_ID_152>": 209,
3089
+ "<SENTINEL_ID_153>": 210,
3090
+ "<SENTINEL_ID_154>": 211,
3091
+ "<SENTINEL_ID_155>": 212,
3092
+ "<SENTINEL_ID_156>": 213,
3093
+ "<SENTINEL_ID_157>": 214,
3094
+ "<SENTINEL_ID_158>": 215,
3095
+ "<SENTINEL_ID_159>": 216,
3096
+ "<SENTINEL_ID_160>": 217,
3097
+ "<SENTINEL_ID_161>": 218,
3098
+ "<SENTINEL_ID_162>": 219,
3099
+ "<SENTINEL_ID_163>": 220,
3100
+ "<SENTINEL_ID_164>": 221,
3101
+ "<SENTINEL_ID_165>": 222,
3102
+ "<SENTINEL_ID_166>": 223,
3103
+ "<SENTINEL_ID_167>": 224,
3104
+ "<SENTINEL_ID_168>": 225,
3105
+ "<SENTINEL_ID_169>": 226,
3106
+ "<SENTINEL_ID_170>": 227,
3107
+ "<SENTINEL_ID_171>": 228,
3108
+ "<SENTINEL_ID_172>": 229,
3109
+ "<SENTINEL_ID_173>": 230,
3110
+ "<SENTINEL_ID_174>": 231,
3111
+ "<SENTINEL_ID_175>": 232,
3112
+ "<SENTINEL_ID_176>": 233,
3113
+ "<SENTINEL_ID_177>": 234,
3114
+ "<SENTINEL_ID_178>": 235,
3115
+ "<SENTINEL_ID_179>": 236,
3116
+ "<SENTINEL_ID_180>": 237,
3117
+ "<SENTINEL_ID_181>": 238,
3118
+ "<SENTINEL_ID_182>": 239,
3119
+ "<SENTINEL_ID_183>": 240,
3120
+ "<SENTINEL_ID_184>": 241,
3121
+ "<SENTINEL_ID_185>": 242,
3122
+ "<SENTINEL_ID_186>": 243,
3123
+ "<SENTINEL_ID_187>": 244,
3124
+ "<SENTINEL_ID_188>": 245,
3125
+ "<SENTINEL_ID_189>": 246,
3126
+ "<SENTINEL_ID_190>": 247,
3127
+ "<SENTINEL_ID_191>": 248,
3128
+ "<SENTINEL_ID_192>": 249,
3129
+ "<SENTINEL_ID_193>": 250,
3130
+ "<SENTINEL_ID_194>": 251,
3131
+ "<SENTINEL_ID_195>": 252,
3132
+ "<SENTINEL_ID_196>": 253,
3133
+ "<SENTINEL_ID_197>": 254,
3134
+ "<SENTINEL_ID_198>": 255,
3135
+ "<SENTINEL_ID_199>": 256,
3136
+ "<MOLECULAR_ENTITY_TYPE_ANTIGEN>": 257,
3137
+ "<MOLECULAR_ENTITY_TYPE_ANTIBODY_LIGHT_CHAIN>": 258,
3138
+ "<MOLECULAR_ENTITY_TYPE_ANTIBODY_HEAVY_CHAIN>": 259,
3139
+ "<ATTRIBUTE_ORGANISM>": 260,
3140
+ "<ATTRIBUTE_ORGANISM_HUMAN>": 261,
3141
+ "<ATTRIBUTE_ORGANISM_RABBIT>": 262,
3142
+ "<ATTRIBUTE_ORGANISM_RAT>": 263,
3143
+ "<ATTRIBUTE_ORGANISM_MOUSE>": 264,
3144
+ "<ATTRIBUTE_ORGANISM_MONKEY>": 265,
3145
+ "<ATTRIBUTE_ORGANISM_CAMEL>": 266,
3146
+ "<EPITOPE_PARATOPE_PREDICTION>": 267,
3147
+ "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR1>": 268,
3148
+ "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR3>": 269,
3149
+ "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR3>": 270,
3150
+ "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR2>": 271,
3151
+ "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR2>": 272,
3152
+ "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR1>": 273,
3153
+ "<MOLECULAR_ENTITY_GENERAL_PROTEIN>": 274,
3154
+ "<TIMESTEP>": 275,
3155
+ "<DIFFUSION>": 276,
3156
+ "<SEQUENCE_NATURAL_END>": 277,
3157
+ "<SMILES_SEQUENCE>": 278,
3158
+ "<SELFIES_SEQUENCE>": 279,
3159
+ "<AMINO_ACID_SEQUENCE>": 280,
3160
+ "<GENERAL_AFFINITY_CLASS>": 281,
3161
+ "<BACKSPACE>": 282,
3162
+ "<SEQUENCE_NATURAL_START>": 283,
3163
+ "<NOOP>": 284,
3164
+ "<TARGETED_ANTIBODY_DESIGN_ENCODER_ONLY_MODE>": 285,
3165
+ "<MOLECULAR_ENTITY_SMALL_MOLECULE>": 286,
3166
+ "<MOLECULAR_ENTITY_CELL_GENE_EXPRESSION_RANKED>": 287,
3167
+ "<CELL_TYPE_CLASS>": 288,
3168
+ "<TISSUE_TYPE_CLASS>": 289,
3169
+ "<CORRUPTED_AREA_START>": 290,
3170
+ "<CORRUPTED_AREA_END>": 291,
3171
+ "<MOLECULAR_ENTITY_MUTATED_PROTEIN_CHAIN>": 292,
3172
+ "<MOLECULAR_ENTITY_PROTEIN_CHAIN>": 293,
3173
+ "<COMPLEX_ENTITY>": 294,
3174
+ "<ALTERNATIVE>": 295,
3175
+ "<CDR3_REGION>": 296,
3176
+ "<GENERAL_CHAIN>": 297,
3177
+ "<SUBMOLECULAR_ENTITY>": 298,
3178
+ "<MUTATED>": 299,
3179
+ "<MOLECULAR_ENTITY_TCR_ALPHA_CDR3>": 300,
3180
+ "<MOLECULAR_ENTITY_TCR_DELTA_CDR3>": 301,
3181
+ "<MOLECULAR_ENTITY_TCR_DELTA_VAR>": 302,
3182
+ "<MOLECULAR_ENTITY_TCR_GAMMA_CDR3>": 303,
3183
+ "<MOLECULAR_ENTITY_TCR_GAMMA_VAR>": 304,
3184
+ "<SCALAR>": 305,
3185
+ "<VECTOR>": 306,
3186
+ "<MASKED_SCALAR>": 307,
3187
+ "<MASKED_VECTOR>": 308,
3188
+ "<AUTOENCODER_LATENT_LOG_VARIANCE>": 309,
3189
+ "<AUTOENCODER_LATENT_MEAN>": 310,
3190
+ "<AUTOENCODER_LATENT_SAMPLED_Z>": 311,
3191
+ "<AUTOENCODER_TASK>": 312,
3192
+ "<DECODED_FROM_LATENT>": 313,
3193
+ "<BBBP>": 314,
3194
+ "<FDA_APPR>": 315,
3195
+ "<HIV_ACTIVITY>": 316,
3196
+ "[CL:0000499]": 3522,
3197
+ "[CL:2000060]": 3523,
3198
+ "[CL:0000235]": 3524,
3199
+ "[CL:0002343]": 3525,
3200
+ "[CL:0000084]": 3526,
3201
+ "[CL:2000042]": 3527,
3202
+ "[CL:0000066]": 3528,
3203
+ "[CL:0008036]": 3529,
3204
+ "[CL:3000001]": 3530,
3205
+ "[CL:0000525]": 3531,
3206
+ "[CL:0000623]": 3532,
3207
+ "[CL:0009095]": 3533,
3208
+ "[CL:0000003]": 3534,
3209
+ "[CL:0001078]": 3535,
3210
+ "[CL:0000815]": 3536,
3211
+ "[CL:0002601]": 3537,
3212
+ "[CL:0000451]": 3538,
3213
+ "[CL:0009092]": 3539,
3214
+ "[CL:0002138]": 3540,
3215
+ "[CL:0000236]": 3541,
3216
+ "[CL:0000786]": 3542,
3217
+ "[CL:0000094]": 3543,
3218
+ "[CL:0002488]": 3544,
3219
+ "[CL:0000625]": 3545,
3220
+ "[CL:0000913]": 3546,
3221
+ "[CL:0000624]": 3547,
3222
+ "[CL:0000895]": 3548,
3223
+ "[CL:0000905]": 3549,
3224
+ "[CL:0000792]": 3550,
3225
+ "[CL:0000940]": 3551,
3226
+ "[CL:0000788]": 3552,
3227
+ "[CL:0000900]": 3553,
3228
+ "[CL:0000798]": 3554,
3229
+ "[CL:0000970]": 3555,
3230
+ "[CL:0000985]": 3556,
3231
+ "[CL:0000972]": 3557,
3232
+ "[CL:0000987]": 3558,
3233
+ "[CL:0011026]": 3559,
3234
+ "[CL:4023040]": 3560,
3235
+ "[CL:0002555]": 3561,
3236
+ "[CL:0000071]": 3562,
3237
+ "[CL:0002326]": 3563,
3238
+ "[CL:0008034]": 3564,
3239
+ "[CL:0002324]": 3565,
3240
+ "[CL:0000128]": 3566,
3241
+ "[CL:4023016]": 3567,
3242
+ "[CL:4023018]": 3568,
3243
+ "[CL:0000988]": 3569,
3244
+ "[CL:0002605]": 3570,
3245
+ "[CL:4023011]": 3571,
3246
+ "[CL:4023017]": 3572,
3247
+ "[CL:0000129]": 3573,
3248
+ "[CL:0002453]": 3574,
3249
+ "[CL:4023015]": 3575,
3250
+ "[CL:4023012]": 3576,
3251
+ "[CL:4023013]": 3577,
3252
+ "[CL:4023038]": 3578,
3253
+ "[CL:4023036]": 3579,
3254
+ "[CL:4023070]": 3580,
3255
+ "[CL:4023051]": 3581,
3256
+ "[CL:4023041]": 3582,
3257
+ "[CL:1001602]": 3583,
3258
+ "[CL:0000151]": 3584,
3259
+ "[CL:0000860]": 3585,
3260
+ "[CL:0000186]": 3586,
3261
+ "[CL:0002306]": 3587,
3262
+ "[CL:0000814]": 3588,
3263
+ "[CL:1001431]": 3589,
3264
+ "[CL:1001106]": 3590,
3265
+ "[CL:0000057]": 3591,
3266
+ "[CL:1000692]": 3592,
3267
+ "[CL:0000115]": 3593,
3268
+ "[CL:0000192]": 3594,
3269
+ "[CL:0002117]": 3595,
3270
+ "[CL:0000233]": 3596,
3271
+ "[CL:0001044]": 3597,
3272
+ "[CL:0001050]": 3598,
3273
+ "[CL:1001111]": 3599,
3274
+ "[CL:0000067]": 3600,
3275
+ "[CL:1001107]": 3601,
3276
+ "[CL:1001432]": 3602,
3277
+ "[CL:1000849]": 3603,
3278
+ "[CL:0000669]": 3604,
3279
+ "[CL:0000738]": 3605,
3280
+ "[CL:0000492]": 3606,
3281
+ "[CL:1000768]": 3607,
3282
+ "[CL:0000979]": 3608,
3283
+ "[CL:0000875]": 3609,
3284
+ "[CL:0000097]": 3610,
3285
+ "[CL:0000081]": 3611,
3286
+ "[CL:0000232]": 3612,
3287
+ "[CL:1001318]": 3613,
3288
+ "[CL:0002393]": 3614,
3289
+ "[CL:0000653]": 3615,
3290
+ "[CL:1000597]": 3616,
3291
+ "[CL:1000452]": 3617,
3292
+ "[CL:0002319]": 3618,
3293
+ "[CL:0019026]": 3619,
3294
+ "[CL:0019029]": 3620,
3295
+ "[CL:0019028]": 3621,
3296
+ "[CL:0019022]": 3622,
3297
+ "[CL:0010011]": 3623,
3298
+ "[CL:0000091]": 3624,
3299
+ "[CL:0000863]": 3625,
3300
+ "[CL:0000679]": 3626,
3301
+ "[CL:0000632]": 3627,
3302
+ "[CL:0013000]": 3628,
3303
+ "[CL:0001054]": 3629,
3304
+ "[CL:1000488]": 3630,
3305
+ "[CL:0000939]": 3631,
3306
+ "[CL:0000789]": 3632,
3307
+ "[CL:0000182]": 3633,
3308
+ "[CL:0019021]": 3634,
3309
+ "[CL:0000904]": 3635,
3310
+ "[CL:0001042]": 3636,
3311
+ "[CL:0000785]": 3637,
3312
+ "[CL:0002038]": 3638,
3313
+ "[CL:0000938]": 3639,
3314
+ "[CL:0000980]": 3640,
3315
+ "[CL:0000816]": 3641,
3316
+ "[CL:0000784]": 3642,
3317
+ "[CL:0002396]": 3643,
3318
+ "[CL:0000764]": 3644,
3319
+ "[CL:0000359]": 3645,
3320
+ "[CL:0000782]": 3646,
3321
+ "[CL:0001056]": 3647,
3322
+ "[CL:0000695]": 3648,
3323
+ "[CL:0000986]": 3649,
3324
+ "[CL:0001024]": 3650,
3325
+ "[CL:0000576]": 3651,
3326
+ "[CL:0001077]": 3652,
3327
+ "[CL:0008001]": 3653,
3328
+ "[CL:0000545]": 3654,
3329
+ "[CL:0000546]": 3655,
3330
+ "[CL:0001081]": 3656,
3331
+ "[CL:0001066]": 3657,
3332
+ "[CL:0000990]": 3658,
3333
+ "[CL:0000542]": 3659,
3334
+ "[CL:0000604]": 3660,
3335
+ "[CL:0000748]": 3661,
3336
+ "[CL:0000125]": 3662,
3337
+ "[CL:0000573]": 3663,
3338
+ "[CL:0000561]": 3664,
3339
+ "[CL:0000745]": 3665,
3340
+ "[CL:0000749]": 3666,
3341
+ "[CL:0000750]": 3667,
3342
+ "[CL:0000636]": 3668,
3343
+ "[CL:0000740]": 3669,
3344
+ "[CL:0000127]": 3670,
3345
+ "[CL:0002586]": 3671,
3346
+ "[CL:0000583]": 3672,
3347
+ "[CL:0002063]": 3673,
3348
+ "[CL:0002062]": 3674,
3349
+ "[CL:0000158]": 3675,
3350
+ "[CL:0000064]": 3676,
3351
+ "[CL:0000138]": 3677,
3352
+ "[CL:0000160]": 3678,
3353
+ "[CL:0000646]": 3679,
3354
+ "[CL:0000595]": 3680,
3355
+ "[CL:1000413]": 3681,
3356
+ "[CL:0002543]": 3682,
3357
+ "[CL:1000223]": 3683,
3358
+ "[CL:0002341]": 3684,
3359
+ "[CL:1000296]": 3685,
3360
+ "[CL:0002340]": 3686,
3361
+ "[CL:1000304]": 3687,
3362
+ "[CL:0002399]": 3688,
3363
+ "[CL:1000487]": 3689,
3364
+ "[CL:0000810]": 3690,
3365
+ "[CL:1000305]": 3691,
3366
+ "[CL:2000059]": 3692,
3367
+ "[CL:0002622]": 3693,
3368
+ "[CL:1000299]": 3694,
3369
+ "[CL:0000811]": 3695,
3370
+ "[CL:0002394]": 3696,
3371
+ "[CL:0000165]": 3697,
3372
+ "[CL:0000775]": 3698,
3373
+ "[CL:0000908]": 3699,
3374
+ "[CL:0000921]": 3700,
3375
+ "[CL:0000897]": 3701,
3376
+ "[CL:1000398]": 3702,
3377
+ "[CL:0001065]": 3703,
3378
+ "[CL:0000787]": 3704,
3379
+ "[CL:1000329]": 3705,
3380
+ "[CL:0000319]": 3706,
3381
+ "[CL:0000037]": 3707,
3382
+ "[CL:1000330]": 3708,
3383
+ "[CL:2000055]": 3709,
3384
+ "[CL:0005006]": 3710,
3385
+ "[CL:0002538]": 3711,
3386
+ "[CL:0000767]": 3712,
3387
+ "[CL:0000809]": 3713,
3388
+ "[CL:0002623]": 3714,
3389
+ "[CL:0000746]": 3715,
3390
+ "[CL:1000432]": 3716,
3391
+ "[CL:0002071]": 3717,
3392
+ "[CL:0010008]": 3718,
3393
+ "[CL:0000909]": 3719,
3394
+ "[CL:4006000]": 3720,
3395
+ "[CL:0000763]": 3721,
3396
+ "[CL:0000068]": 3722,
3397
+ "[CL:0000575]": 3723,
3398
+ "[CL:1001428]": 3724,
3399
+ "[CL:0002064]": 3725,
3400
+ "[CL:0000287]": 3726,
3401
+ "[CL:0009009]": 3727,
3402
+ "[CL:0002079]": 3728,
3403
+ "[CL:0000134]": 3729,
3404
+ "[CL:0002677]": 3730,
3405
+ "[CL:0002363]": 3731,
3406
+ "[CL:0000794]": 3732,
3407
+ "[CL:0002585]": 3733,
3408
+ "[CL:0002503]": 3734,
3409
+ "[CL:0000038]": 3735,
3410
+ "[CL:1000320]": 3736,
3411
+ "[CL:0002144]": 3737,
3412
+ "[CL:0002410]": 3738,
3413
+ "[CL:0008011]": 3739,
3414
+ "[CL:0002139]": 3740,
3415
+ "[CL:0002548]": 3741,
3416
+ "[CL:0000148]": 3742,
3417
+ "[CL:0002366]": 3743,
3418
+ "[CL:0000185]": 3744,
3419
+ "[CL:0000681]": 3745,
3420
+ "[CL:0002149]": 3746,
3421
+ "[CL:0000187]": 3747,
3422
+ "[CL:0000034]": 3748,
3423
+ "[CL:0009016]": 3749,
3424
+ "[CL:0002365]": 3750,
3425
+ "[CL:0000190]": 3751,
3426
+ "[CL:0000114]": 3752,
3427
+ "[CL:1000436]": 3753,
3428
+ "[CL:0009011]": 3754,
3429
+ "[CL:0009005]": 3755,
3430
+ "[CL:0000136]": 3756,
3431
+ "[CL:0000189]": 3757,
3432
+ "[CL:0000131]": 3758,
3433
+ "[CL:0000049]": 3759,
3434
+ "[CL:1001516]": 3760,
3435
+ "[CL:0000312]": 3761,
3436
+ "[CL:0000808]": 3762,
3437
+ "[CL:0000453]": 3763,
3438
+ "[CL:0000169]": 3764,
3439
+ "[CL:0002303]": 3765,
3440
+ "[CL:0019032]": 3766,
3441
+ "[CL:0000841]": 3767,
3442
+ "[CL:0002673]": 3768,
3443
+ "[CL:0000388]": 3769,
3444
+ "[CL:0002370]": 3770,
3445
+ "[CL:0000077]": 3771,
3446
+ "[CL:0002518]": 3772,
3447
+ "[CL:0000807]": 3773,
3448
+ "[CL:0000584]": 3774,
3449
+ "[CL:1000334]": 3775,
3450
+ "[CL:0000894]": 3776,
3451
+ "[CL:1000271]": 3777,
3452
+ "[CL:2000016]": 3778,
3453
+ "[CL:1000495]": 3779,
3454
+ "[CL:1000343]": 3780,
3455
+ "[CL:0002250]": 3781,
3456
+ "[CL:0009012]": 3782,
3457
+ "[CL:0002598]": 3783,
3458
+ "[CL:0009017]": 3784,
3459
+ "[CL:0017000]": 3785,
3460
+ "[CL:1000331]": 3786,
3461
+ "[CL:0000019]": 3787,
3462
+ "[CL:0002573]": 3788,
3463
+ "[CL:0000188]": 3789,
3464
+ "[CL:0000893]": 3790,
3465
+ "[CL:0002320]": 3791,
3466
+ "[CL:0000171]": 3792,
3467
+ "[CL:0002275]": 3793,
3468
+ "[CL:1001589]": 3794,
3469
+ "[CL:0000823]": 3795,
3470
+ "[CL:0000173]": 3796,
3471
+ "[CL:0001058]": 3797,
3472
+ "[CL:0000934]": 3798,
3473
+ "[CL:0000818]": 3799,
3474
+ "[CL:0001043]": 3800,
3475
+ "[CL:0001049]": 3801,
3476
+ "[CL:0000776]": 3802,
3477
+ "[CL:0005012]": 3803,
3478
+ "[CL:0000982]": 3804,
3479
+ "[CL:0000984]": 3805,
3480
+ "[CL:0001076]": 3806,
3481
+ "[CL:0002489]": 3807,
3482
+ "[CL:1000272]": 3808,
3483
+ "[CL:0019019]": 3809,
3484
+ "[CL:0000135]": 3810,
3485
+ "[CL:0002057]": 3811,
3486
+ "[CL:0000791]": 3812,
3487
+ "[CL:0002397]": 3813,
3488
+ "[CL:0000556]": 3814,
3489
+ "[CL:0000800]": 3815,
3490
+ "[CL:0000076]": 3816,
3491
+ "[CL:0000121]": 3817,
3492
+ "[CL:0000498]": 3818,
3493
+ "[CL:0000120]": 3819,
3494
+ "[CL:0000123]": 3820,
3495
+ "[CL:0000122]": 3821,
3496
+ "[CL:0000555]": 3822,
3497
+ "[CL:0000126]": 3823,
3498
+ "[CL:0000540]": 3824,
3499
+ "[CL:0000164]": 3825,
3500
+ "[CL:0002425]": 3826,
3501
+ "[CL:0000817]": 3827,
3502
+ "[CL:0000557]": 3828,
3503
+ "[CL:0000826]": 3829,
3504
+ "[CL:0000050]": 3830,
3505
+ "[CL:0000834]": 3831,
3506
+ "[CL:0002045]": 3832,
3507
+ "[CL:0001082]": 3833,
3508
+ "[CL:0000617]": 3834,
3509
+ "[CL:0002563]": 3835,
3510
+ "[CL:0002350]": 3836,
3511
+ "[CL:1000309]": 3837,
3512
+ "[CL:2000041]": 3838,
3513
+ "[CL:0000166]": 3839,
3514
+ "[CL:0005025]": 3840,
3515
+ "[CL:0000765]": 3841,
3516
+ "[CL:2000046]": 3842,
3517
+ "[CL:0002129]": 3843,
3518
+ "[CL:0008019]": 3844,
3519
+ "[CL:0000216]": 3845,
3520
+ "[CL:0000501]": 3846,
3521
+ "[CL:0002094]": 3847,
3522
+ "[CL:0000630]": 3848,
3523
+ "[CL:2000064]": 3849,
3524
+ "[CL:0000178]": 3850,
3525
+ "[CL:0000586]": 3851,
3526
+ "[CL:0000024]": 3852,
3527
+ "[CL:0002371]": 3853,
3528
+ "[CL:0000670]": 3854,
3529
+ "[CL:0000023]": 3855,
3530
+ "[CL:0000015]": 3856,
3531
+ "[CL:0000650]": 3857,
3532
+ "[CL:0005026]": 3858,
3533
+ "[CL:0000209]": 3859,
3534
+ "[CL:0002632]": 3860,
3535
+ "[CL:0000622]": 3861,
3536
+ "[CL:0000594]": 3862,
3537
+ "[CL:0000163]": 3863,
3538
+ "[CL:0007011]": 3864,
3539
+ "[CL:0011101]": 3865,
3540
+ "[CL:0002097]": 3866,
3541
+ "[CL:0000210]": 3867,
3542
+ "[CL:0011103]": 3868,
3543
+ "[CL:0002204]": 3869,
3544
+ "[CL:0000145]": 3870,
3545
+ "[CL:0000397]": 3871,
3546
+ "[CL:0000103]": 3872,
3547
+ "[CL:0002293]": 3873,
3548
+ "[CL:0011004]": 3874,
3549
+ "[CL:0002504]": 3875,
3550
+ "[CL:0000677]": 3876,
3551
+ "[CL:1000279]": 3877,
3552
+ "[CL:0002088]": 3878,
3553
+ "[CL:0000898]": 3879,
3554
+ "[CL:0019031]": 3880,
3555
+ "[CL:0008015]": 3881,
3556
+ "[CL:0000100]": 3882,
3557
+ "[CL:1000275]": 3883,
3558
+ "[CL:0000099]": 3884,
3559
+ "[CL:0001071]": 3885,
3560
+ "[CL:0000766]": 3886,
3561
+ "[CL:0000082]": 3887,
3562
+ "[CL:2000032]": 3888,
3563
+ "[CL:0011115]": 3889,
3564
+ "[CL:0000682]": 3890,
3565
+ "[CL:0000906]": 3891,
3566
+ "[CL:0000896]": 3892,
3567
+ "[CL:0002131]": 3893,
3568
+ "[CL:0000813]": 3894,
3569
+ "[CL:0000079]": 3895,
3570
+ "[CL:2000095]": 3896,
3571
+ "[CL:1000909]": 3897,
3572
+ "[CL:0005010]": 3898,
3573
+ "[CL:0002322]": 3899,
3574
+ "[CL:0000890]": 3900,
3575
+ "[CL:0002258]": 3901,
3576
+ "[CL:0001063]": 3902,
3577
+ "[CL:0000696]": 3903,
3578
+ "[CL:0005019]": 3904,
3579
+ "[CL:0005022]": 3905,
3580
+ "[CL:0002627]": 3906,
3581
+ "[CL:0002629]": 3907,
3582
+ "[CL:0005011]": 3908,
3583
+ "[CL:0005009]": 3909,
3584
+ "[CL:1000892]": 3910,
3585
+ "[CL:0002201]": 3911,
3586
+ "[CL:0000790]": 3912,
3587
+ "[CL:1001433]": 3913,
3588
+ "[CL:0000155]": 3914,
3589
+ "[CL:0002633]": 3915,
3590
+ "[CL:1000143]": 3916,
3591
+ "[CL:0001057]": 3917,
3592
+ "[CL:0009002]": 3918,
3593
+ "[CL:0002241]": 3919,
3594
+ "[CL:1000491]": 3920,
3595
+ "[CL:0019001]": 3921,
3596
+ "[CL:0002102]": 3922,
3597
+ "[CL:0000822]": 3923,
3598
+ "[CL:0002327]": 3924,
3599
+ "[CL:0000644]": 3925,
3600
+ "[CL:0002179]": 3926,
3601
+ "[CL:0002252]": 3927,
3602
+ "[CL:0000651]": 3928,
3603
+ "[CL:0000903]": 3929,
3604
+ "[CL:0000508]": 3930,
3605
+ "[CL:0002268]": 3931,
3606
+ "[CL:0002657]": 3932,
3607
+ "[CL:0000162]": 3933,
3608
+ "[CL:2000006]": 3934,
3609
+ "[CL:0009112]": 3935,
3610
+ "[CL:0009111]": 3936,
3611
+ "[CL:0000907]": 3937,
3612
+ "[CL:0009113]": 3938,
3613
+ "[CL:0000529]": 3939,
3614
+ "[CL:1000443]": 3940,
3615
+ "[CL:0000132]": 3941,
3616
+ "[CL:0002243]": 3942,
3617
+ "[CL:0002304]": 3943,
3618
+ "[CL:0009010]": 3944,
3619
+ "[CL:0000442]": 3945,
3620
+ "[CL:0002223]": 3946,
3621
+ "[CL:1001509]": 3947,
3622
+ "[CL:0002224]": 3948,
3623
+ "[CL:0002225]": 3949,
3624
+ "[CL:0001062]": 3950,
3625
+ "[CL:0011019]": 3951,
3626
+ "[CL:0002495]": 3952,
3627
+ "[CL:0009039]": 3953,
3628
+ "[CL:0000796]": 3954,
3629
+ "[CL:0009041]": 3955,
3630
+ "[CL:0009043]": 3956,
3631
+ "[CL:0002254]": 3957,
3632
+ "[CL:0000569]": 3958,
3633
+ "[CL:0000843]": 3959,
3634
+ "[CL:0009042]": 3960,
3635
+ "[CL:0009006]": 3961,
3636
+ "[CL:1000353]": 3962,
3637
+ "[CL:0002145]": 3963,
3638
+ "[CL:0000861]": 3964,
3639
+ "[CL:0002480]": 3965,
3640
+ "[CL:4028004]": 3966,
3641
+ "[CL:4028006]": 3967,
3642
+ "[CL:1001568]": 3968,
3643
+ "[CL:0009089]": 3969,
3644
+ "[CL:2000093]": 3970,
3645
+ "[CL:1001603]": 3971,
3646
+ "[CL:4030023]": 3972,
3647
+ "[CL:0000313]": 3973,
3648
+ "[CL:1000312]": 3974,
3649
+ "[CL:0010003]": 3975,
3650
+ "[CL:0019003]": 3976,
3651
+ "[CL:0002075]": 3977,
3652
+ "[CL:4030006]": 3978,
3653
+ "[CL:0000878]": 3979,
3654
+ "[CL:0000065]": 3980,
3655
+ "[CL:0000706]": 3981,
3656
+ "[CL:0001061]": 3982,
3657
+ "[CL:1001131]": 3983,
3658
+ "[CL:1001285]": 3984,
3659
+ "[CL:1001005]": 3985,
3660
+ "[CL:0000075]": 3986,
3661
+ "[CL:1001225]": 3987,
3662
+ "[CL:0002187]": 3988,
3663
+ "[CL:0000649]": 3989,
3664
+ "[CL:0002355]": 3990,
3665
+ "[CL:0000559]": 3991,
3666
+ "[CL:0000837]": 3992,
3667
+ "[CL:0002193]": 3993,
3668
+ "[CL:0000838]": 3994,
3669
+ "[CL:0000836]": 3995,
3670
+ "[CL:0000051]": 3996,
3671
+ "[CL:0000839]": 3997,
3672
+ "[CL:0000547]": 3998,
3673
+ "[CL:0000936]": 3999,
3674
+ "[CL:0000092]": 4000,
3675
+ "[CL:0001029]": 4001,
3676
+ "[CL:0007010]": 4002,
3677
+ "[CL:0000242]": 4003,
3678
+ "[CL:0002262]": 4004,
3679
+ "[CL:2000092]": 4005,
3680
+ "[CL:0000771]": 4006,
3681
+ "[CL:0000704]": 4007,
3682
+ "[CL:0000062]": 4008,
3683
+ "[CL:0000680]": 4009,
3684
+ "[CL:0002028]": 4010,
3685
+ "[CL:0002157]": 4011,
3686
+ "[CL:0002189]": 4012,
3687
+ "[CL:0002009]": 4013,
3688
+ "[CL:0000819]": 4014,
3689
+ "[CL:0000915]": 4015,
3690
+ "[CL:0000957]": 4016,
3691
+ "[CL:0000954]": 4017,
3692
+ "[CL:0002048]": 4018,
3693
+ "[CL:0001069]": 4019,
3694
+ "[CL:0000253]": 4020,
3695
+ "[CL:0002010]": 4021,
3696
+ "[CL:1000449]": 4022,
3697
+ "[CL:0000212]": 4023,
3698
+ "[CL:0000222]": 4024,
3699
+ "[CL:1000347]": 4025,
3700
+ "[CL:0011108]": 4026,
3701
+ "[CL:0000844]": 4027,
3702
+ "[CL:0000577]": 4028,
3703
+ "[CL:0000031]": 4029,
3704
+ "[CL:0005021]": 4030,
3705
+ "[CL:0001080]": 4031,
3706
+ "[CL:0002607]": 4032,
3707
+ "[CL:0000531]": 4033,
3708
+ "[CL:0000899]": 4034,
3709
+ "[CL:0001074]": 4035,
3710
+ "[CL:0000432]": 4036,
3711
+ "[CL:0001079]": 4037,
3712
+ "[CL:0009022]": 4038,
3713
+ "[CL:0000502]": 4039,
3714
+ "[CL:0002277]": 4040,
3715
+ "[CL:0002279]": 4041,
3716
+ "[CL:0002278]": 4042,
3717
+ "[CL:0002351]": 4043,
3718
+ "[CL:0002280]": 4044,
3719
+ "[CL:0001064]": 4045,
3720
+ "[CL:0002553]": 4046,
3721
+ "[CL:0000001]": 4047,
3722
+ "[CL:0002419]": 4048,
3723
+ "[CL:0002368]": 4049,
3724
+ "[CL:0002154]": 4050,
3725
+ "[CL:0002151]": 4051,
3726
+ "[CL:0000553]": 4052,
3727
+ "[CL:0000821]": 4053,
3728
+ "[CL:0000956]": 4054,
3729
+ "[CL:0000558]": 4055,
3730
+ "[CL:0000820]": 4056,
3731
+ "[CL:0002375]": 4057,
3732
+ "[CL:0000922]": 4058,
3733
+ "[CL:0000937]": 4059,
3734
+ "[CL:0002377]": 4060,
3735
+ "[CL:0011020]": 4061,
3736
+ "[CL:0000150]": 4062,
3737
+ "[CL:0000055]": 4063,
3738
+ "[CL:0002092]": 4064,
3739
+ "[CL:0000513]": 4065,
3740
+ "[CL:0000006]": 4066,
3741
+ "[CL:0000514]": 4067,
3742
+ "[CL:1000311]": 4068,
3743
+ "[CL:0010022]": 4069,
3744
+ "[CL:0011012]": 4070,
3745
+ "[CL:0002364]": 4071,
3746
+ "[CL:0002678]": 4072,
3747
+ "[CL:0000827]": 4073,
3748
+ "[CL:0002132]": 4074,
3749
+ "[CL:0000751]": 4075,
3750
+ "[CL:0000503]": 4076,
3751
+ "[CL:1000428]": 4077,
3752
+ "[CL:0011025]": 4078,
3753
+ "[CL:0009038]": 4079,
3754
+ "[CL:0001204]": 4080,
3755
+ "[CL:0000255]": 4081,
3756
+ "[CL:0000548]": 4082,
3757
+ "[CL:0001203]": 4083,
3758
+ "[CL:0000322]": 4084,
3759
+ "[CL:0000113]": 4085,
3760
+ "[CL:1000342]": 4086,
3761
+ "[CL:0000842]": 4087,
3762
+ "[CL:1000326]": 4088,
3763
+ "[CL:0000802]": 4089,
3764
+ "[CL:1000278]": 4090,
3765
+ "[CL:1000348]": 4091,
3766
+ "[CL:0002207]": 4092,
3767
+ "[CL:2000001]": 4093,
3768
+ "[CL:1000500]": 4094,
3769
+ "[CL:1001033]": 4095,
3770
+ "[CL:1000450]": 4096,
3771
+ "[CL:0002188]": 4097,
3772
+ "[CL:1000494]": 4098,
3773
+ "[CL:0000731]": 4099,
3774
+ "[CL:0011024]": 4100,
3775
+ "[CL:0002422]": 4101,
3776
+ "[UBERON:0002450]": 4102,
3777
+ "[UBERON:0001987]": 4103,
3778
+ "[UBERON:0000453]": 4104,
3779
+ "[UBERON:0000178]": 4105,
3780
+ "[UBERON:0002113]": 4106,
3781
+ "[UBERON:0018303]": 4107,
3782
+ "[UBERON:0005406]": 4108,
3783
+ "[UBERON:0010210]": 4109,
3784
+ "[UBERON:0000310]": 4110,
3785
+ "[UBERON:0002771]": 4111,
3786
+ "[UBERON:8410010]": 4112,
3787
+ "[UBERON:0016632]": 4113,
3788
+ "[UBERON:0001225]": 4114,
3789
+ "[UBERON:0012648]": 4115,
3790
+ "[UBERON:0001228]": 4116,
3791
+ "[UBERON:0000362]": 4117,
3792
+ "[UBERON:0003889]": 4118,
3793
+ "[UBERON:0001117]": 4119,
3794
+ "[UBERON:0008933]": 4120,
3795
+ "[UBERON:0016538]": 4121,
3796
+ "[UBERON:0002436]": 4122,
3797
+ "[UBERON:0016530]": 4123,
3798
+ "[UBERON:0001384]": 4124,
3799
+ "[UBERON:0000451]": 4125,
3800
+ "[UBERON:0002822]": 4126,
3801
+ "[UBERON:0001786]": 4127,
3802
+ "[UBERON:0013682]": 4128,
3803
+ "[UBERON:0002048]": 4129,
3804
+ "[UBERON:8410025]": 4130,
3805
+ "[UBERON:8410026]": 4131,
3806
+ "[UBERON:0009834]": 4132,
3807
+ "[UBERON:0001542]": 4133,
3808
+ "[UBERON:0003126]": 4134,
3809
+ "[UBERON:0000029]": 4135,
3810
+ "[UBERON:0002107]": 4136,
3811
+ "[UBERON:0001831]": 4137,
3812
+ "[UBERON:0002106]": 4138,
3813
+ "[UBERON:0002367]": 4139,
3814
+ "[UBERON:0002370]": 4140,
3815
+ "[UBERON:0000059]": 4141,
3816
+ "[UBERON:0001911]": 4142,
3817
+ "[UBERON:0002081]": 4143,
3818
+ "[UBERON:0010033]": 4144,
3819
+ "[UBERON:0000017]": 4145,
3820
+ "[UBERON:0001013]": 4146,
3821
+ "[UBERON:0002190]": 4147,
3822
+ "[UBERON:0001832]": 4148,
3823
+ "[UBERON:0002097]": 4149,
3824
+ "[UBERON:0002371]": 4150,
3825
+ "[UBERON:0001295]": 4151,
3826
+ "[UBERON:0000964]": 4152,
3827
+ "[UBERON:0002082]": 4153,
3828
+ "[UBERON:0018707]": 4154,
3829
+ "[UBERON:0001296]": 4155,
3830
+ "[UBERON:0000970]": 4156,
3831
+ "[UBERON:0010032]": 4157,
3832
+ "[UBERON:0001811]": 4158,
3833
+ "[UBERON:0001773]": 4159,
3834
+ "[UBERON:0008612]": 4160,
3835
+ "[UBERON:0002382]": 4161,
3836
+ "[UBERON:0003902]": 4162,
3837
+ "[UBERON:0001868]": 4163,
3838
+ "[UBERON:0001416]": 4164,
3839
+ "[UBERON:0000995]": 4165,
3840
+ "[UBERON:0001817]": 4166,
3841
+ "[UBERON:0001723]": 4167,
3842
+ "[UBERON:0002108]": 4168,
3843
+ "[UBERON:0002378]": 4169,
3844
+ "[UBERON:0001621]": 4170,
3845
+ "[UBERON:0002385]": 4171,
3846
+ "[UBERON:0002049]": 4172,
3847
+ "[UBERON:0000947]": 4173,
3848
+ "[UBERON:0000016]": 4174,
3849
+ "[UBERON:0008946]": 4175,
3850
+ "[UBERON:0001005]": 4176,
3851
+ "[UBERON:0000004]": 4177,
3852
+ "[UBERON:0002037]": 4178,
3853
+ "[UBERON:0001836]": 4179,
3854
+ "[UBERON:0000991]": 4180,
3855
+ "[UBERON:0001893]": 4181,
3856
+ "[UBERON:0000948]": 4182,
3857
+ "[UBERON:0000160]": 4183,
3858
+ "[UBERON:0001264]": 4184,
3859
+ "[UBERON:0001630]": 4185,
3860
+ "[UBERON:0002369]": 4186,
3861
+ "[UBERON:0000945]": 4187,
3862
+ "[UBERON:0035328]": 4188,
3863
+ "[UBERON:0005616]": 4189,
3864
+ "[UBERON:0001155]": 4190,
3865
+ "[UBERON:0002116]": 4191,
3866
+ "[UBERON:0002098]": 4192,
3867
+ "[UBERON:0002079]": 4193,
3868
+ "[UBERON:0002084]": 4194,
3869
+ "[UBERON:0002080]": 4195,
3870
+ "[UBERON:0002094]": 4196,
3871
+ "[UBERON:0001046]": 4197,
3872
+ "[UBERON:0002078]": 4198,
3873
+ "[UBERON:0002114]": 4199,
3874
+ "[UBERON:0001043]": 4200,
3875
+ "[UBERON:0001161]": 4201,
3876
+ "[UBERON:0002115]": 4202,
3877
+ "[UBERON:0001165]": 4203,
3878
+ "[UBERON:0001901]": 4204,
3879
+ "[UBERON:0002429]": 4205,
3880
+ "[UBERON:0000473]": 4206,
3881
+ "[UBERON:0000955]": 4207,
3882
+ "[UBERON:0007106]": 4208,
3883
+ "[UBERON:0012168]": 4209,
3884
+ "[UBERON:0004339]": 4210,
3885
+ "[UBERON:0000992]": 4211,
3886
+ "[UBERON:0000056]": 4212,
3887
+ "[UBERON:0000006]": 4213,
3888
+ "[UBERON:0000956]": 4214,
3889
+ "[UBERON:0002661]": 4215,
3890
+ "[UBERON:0013756]": 4216,
3891
+ "[UBERON:0039167]": 4217,
3892
+ "[UBERON:0008953]": 4218,
3893
+ "[UBERON:0002728]": 4219,
3894
+ "[UBERON:0001637]": 4220,
3895
+ "[UBERON:0000002]": 4221,
3896
+ "[UBERON:0001157]": 4222,
3897
+ "[UBERON:0001154]": 4223,
3898
+ "[UBERON:0001156]": 4224,
3899
+ "[UBERON:0000977]": 4225,
3900
+ "[UBERON:0002046]": 4226,
3901
+ "[UBERON:0003688]": 4227,
3902
+ "[UBERON:0001871]": 4228,
3903
+ "[UBERON:0001052]": 4229,
3904
+ "[UBERON:0001159]": 4230,
3905
+ "[UBERON:0002110]": 4231,
3906
+ "[UBERON:0002228]": 4232,
3907
+ "[UBERON:0002240]": 4233,
3908
+ "[UBERON:0001898]": 4234,
3909
+ "[UBERON:8440012]": 4235,
3910
+ "[UBERON:0010225]": 4236,
3911
+ "[UBERON:0000988]": 4237,
3912
+ "[UBERON:0002421]": 4238,
3913
+ "[UBERON:0001891]": 4239,
3914
+ "[UBERON:0005290]": 4240,
3915
+ "[UBERON:0008972]": 4241,
3916
+ "[UBERON:0007177]": 4242,
3917
+ "[UBERON:0001153]": 4243,
3918
+ "[UBERON:0008971]": 4244,
3919
+ "[UBERON:0000397]": 4245,
3920
+ "[UBERON:0005636]": 4246,
3921
+ "[UBERON:0000966]": 4247,
3922
+ "[UBERON:0001238]": 4248,
3923
+ "[UBERON:0008345]": 4249,
3924
+ "[UBERON:0013473]": 4250,
3925
+ "[UBERON:0001162]": 4251,
3926
+ "[UBERON:0007650]": 4252,
3927
+ "[UBERON:0008989]": 4253,
3928
+ "[UBERON:0002372]": 4254,
3929
+ "[UBERON:0001769]": 4255,
3930
+ "[UBERON:0006761]": 4256,
3931
+ "[UBERON:0005969]": 4257,
3932
+ "[UBERON:0001775]": 4258,
3933
+ "[UBERON:0000965]": 4259,
3934
+ "[UBERON:0002299]": 4260,
3935
+ "[UBERON:0035213]": 4261,
3936
+ "[UBERON:0001158]": 4262,
3937
+ "[UBERON:0002358]": 4263,
3938
+ "[UBERON:0036292]": 4264,
3939
+ "[UBERON:0007795]": 4265,
3940
+ "[UBERON:0002119]": 4266,
3941
+ "[UBERON:0002118]": 4267,
3942
+ "[UBERON:0000916]": 4268,
3943
+ "[UBERON:0001103]": 4269,
3944
+ "[UBERON:0003697]": 4270,
3945
+ "[UBERON:0035210]": 4271,
3946
+ "[UBERON:0001366]": 4272,
3947
+ "[UBERON:0001255]": 4273,
3948
+ "[UBERON:0008952]": 4274,
3949
+ "[UBERON:0002174]": 4275,
3950
+ "[UBERON:0001332]": 4276,
3951
+ "[UBERON:0002100]": 4277,
3952
+ "[UBERON:0000403]": 4278,
3953
+ "[UBERON:0000328]": 4279,
3954
+ "[UBERON:0000053]": 4280,
3955
+ "[UBERON:0001040]": 4281,
3956
+ "[UBERON:0002509]": 4282,
3957
+ "[UBERON:0012474]": 4283,
3958
+ "[UBERON:0022277]": 4284,
3959
+ "[UBERON:0000175]": 4285,
3960
+ "[UBERON:0001890]": 4286,
3961
+ "[UBERON:0002811]": 4287,
3962
+ "[UBERON:0002808]": 4288,
3963
+ "[UBERON:0002810]": 4289,
3964
+ "[UBERON:0002803]": 4290,
3965
+ "[UBERON:0002809]": 4291,
3966
+ "[UBERON:0023852]": 4292,
3967
+ "[UBERON:0002802]": 4293,
3968
+ "[UBERON:0016525]": 4294,
3969
+ "[UBERON:0002021]": 4295,
3970
+ "[UBERON:0001872]": 4296,
3971
+ "[UBERON:8410000]": 4297,
3972
+ "[UBERON:0007625]": 4298,
3973
+ "[UBERON:0000014]": 4299,
3974
+ "[UBERON:0009472]": 4300,
3975
+ "[UBERON:0013706]": 4301,
3976
+ "[UBERON:0007644]": 4302,
3977
+ "[UBERON:0000030]": 4303,
3978
+ "[UBERON:0000400]": 4304,
3979
+ "[UBERON:0001134]": 4305,
3980
+ "[UBERON:0001976]": 4306,
3981
+ "[UBERON:0001707]": 4307,
3982
+ "[UBERON:0002185]": 4308,
3983
+ "[UBERON:0001224]": 4309,
3984
+ "[UBERON:0003517]": 4310
3985
+ },
3986
+ "unk_token": "<UNK>"
3987
+ }
3988
+ }
tokenizer/config.yaml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ tokenizers_info:
2
+ - name: AA
3
+ tokenizer_id: 0
4
+ json_path: ./t5_tokenizer_AA_special.json
5
+ modular_json_path: ./t5_tokenizer_AA_special.json
6
+ start_delimiter: <start_AA>
7
+ end_delimiter: <end_AA>
8
+ - name: SMILES
9
+ tokenizer_id: 1
10
+ json_path: ./bpe_tokenizer_trained_on_chembl_zinc_with_aug_4272372_samples_balanced_1_1.json
11
+ modular_json_path: ./bpe_tokenizer_trained_on_chembl_zinc_with_aug_4272372_samples_balanced_1_1.json
12
+ start_delimiter: <start_SMILES>
13
+ end_delimiter: <end_SMILES>
14
+ - name: CELL_ATTRIBUTES
15
+ tokenizer_id: 2
16
+ json_path: ./cell_attributes_tokenizer.json
17
+ modular_json_path: ./cell_attributes_tokenizer.json
18
+ start_delimiter: <start_CELL_ATTRIBUTES>
19
+ end_delimiter: <end_CELL_ATTRIBUTES>
20
+ - name: GENE
21
+ tokenizer_id: 3
22
+ json_path: ./gene_tokenizer.json
23
+ modular_json_path: ./gene_tokenizer.json
24
+ start_delimiter: <start_GENE>
25
+ end_delimiter: <end_GENE>
26
+ minimal_token_id: 5000
27
+ max_possible_token_id: 100000
28
+ max_special_token_id: 500
tokenizer/gene_tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/t5_tokenizer_AA_special.json ADDED
@@ -0,0 +1,3225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "<UNK>",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ },
15
+ {
16
+ "id": 1,
17
+ "content": "<PAD>",
18
+ "single_word": false,
19
+ "lstrip": false,
20
+ "rstrip": false,
21
+ "normalized": false,
22
+ "special": true
23
+ },
24
+ {
25
+ "id": 2,
26
+ "content": "<CLS>",
27
+ "single_word": false,
28
+ "lstrip": false,
29
+ "rstrip": false,
30
+ "normalized": false,
31
+ "special": true
32
+ },
33
+ {
34
+ "id": 3,
35
+ "content": "<SEP>",
36
+ "single_word": false,
37
+ "lstrip": false,
38
+ "rstrip": false,
39
+ "normalized": false,
40
+ "special": true
41
+ },
42
+ {
43
+ "id": 4,
44
+ "content": "<MASK>",
45
+ "single_word": false,
46
+ "lstrip": false,
47
+ "rstrip": false,
48
+ "normalized": false,
49
+ "special": true
50
+ },
51
+ {
52
+ "id": 5,
53
+ "content": "<EOS>",
54
+ "single_word": false,
55
+ "lstrip": false,
56
+ "rstrip": false,
57
+ "normalized": false,
58
+ "special": true
59
+ },
60
+ {
61
+ "id": 6,
62
+ "content": "<MOLECULAR_ENTITY>",
63
+ "single_word": false,
64
+ "lstrip": false,
65
+ "rstrip": false,
66
+ "normalized": false,
67
+ "special": true
68
+ },
69
+ {
70
+ "id": 7,
71
+ "content": "<GLOBAL_INTERACTION_ATTRIBUTES>",
72
+ "single_word": false,
73
+ "lstrip": false,
74
+ "rstrip": false,
75
+ "normalized": false,
76
+ "special": true
77
+ },
78
+ {
79
+ "id": 8,
80
+ "content": "<MOLECULAR_ENTITY_ANTIGEN>",
81
+ "single_word": false,
82
+ "lstrip": false,
83
+ "rstrip": false,
84
+ "normalized": false,
85
+ "special": true
86
+ },
87
+ {
88
+ "id": 9,
89
+ "content": "<MOLECULAR_ENTITY_EPITOPE>",
90
+ "single_word": false,
91
+ "lstrip": false,
92
+ "rstrip": false,
93
+ "normalized": false,
94
+ "special": true
95
+ },
96
+ {
97
+ "id": 10,
98
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN>",
99
+ "single_word": false,
100
+ "lstrip": false,
101
+ "rstrip": false,
102
+ "normalized": false,
103
+ "special": true
104
+ },
105
+ {
106
+ "id": 11,
107
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN>",
108
+ "single_word": false,
109
+ "lstrip": false,
110
+ "rstrip": false,
111
+ "normalized": false,
112
+ "special": true
113
+ },
114
+ {
115
+ "id": 12,
116
+ "content": "<MOLECULAR_ENTITY_TCR_ALPHA_CHAIN>",
117
+ "single_word": false,
118
+ "lstrip": false,
119
+ "rstrip": false,
120
+ "normalized": false,
121
+ "special": true
122
+ },
123
+ {
124
+ "id": 13,
125
+ "content": "<MOLECULAR_ENTITY_TCR_BETA_VDJ>",
126
+ "single_word": false,
127
+ "lstrip": false,
128
+ "rstrip": false,
129
+ "normalized": false,
130
+ "special": true
131
+ },
132
+ {
133
+ "id": 14,
134
+ "content": "<MOLECULAR_ENTITY_TCR_BETA_CDR3>",
135
+ "single_word": false,
136
+ "lstrip": false,
137
+ "rstrip": false,
138
+ "normalized": false,
139
+ "special": true
140
+ },
141
+ {
142
+ "id": 15,
143
+ "content": "<BINDING_AFFINITY_CLASS>",
144
+ "single_word": false,
145
+ "lstrip": false,
146
+ "rstrip": false,
147
+ "normalized": false,
148
+ "special": true
149
+ },
150
+ {
151
+ "id": 16,
152
+ "content": "<DECODER_START>",
153
+ "single_word": false,
154
+ "lstrip": false,
155
+ "rstrip": false,
156
+ "normalized": false,
157
+ "special": true
158
+ },
159
+ {
160
+ "id": 17,
161
+ "content": "<BINDING>",
162
+ "single_word": false,
163
+ "lstrip": false,
164
+ "rstrip": false,
165
+ "normalized": false,
166
+ "special": true
167
+ },
168
+ {
169
+ "id": 18,
170
+ "content": "<FILLIN>",
171
+ "single_word": false,
172
+ "lstrip": false,
173
+ "rstrip": false,
174
+ "normalized": false,
175
+ "special": true
176
+ },
177
+ {
178
+ "id": 19,
179
+ "content": "<REORDER>",
180
+ "single_word": false,
181
+ "lstrip": false,
182
+ "rstrip": false,
183
+ "normalized": false,
184
+ "special": true
185
+ },
186
+ {
187
+ "id": 20,
188
+ "content": "<TOAA>",
189
+ "single_word": false,
190
+ "lstrip": false,
191
+ "rstrip": false,
192
+ "normalized": false,
193
+ "special": true
194
+ },
195
+ {
196
+ "id": 21,
197
+ "content": "<ACTIVE>",
198
+ "single_word": false,
199
+ "lstrip": false,
200
+ "rstrip": false,
201
+ "normalized": false,
202
+ "special": true
203
+ },
204
+ {
205
+ "id": 22,
206
+ "content": "<GENESEQ>",
207
+ "single_word": false,
208
+ "lstrip": false,
209
+ "rstrip": false,
210
+ "normalized": false,
211
+ "special": true
212
+ },
213
+ {
214
+ "id": 23,
215
+ "content": "<INCREASE>",
216
+ "single_word": false,
217
+ "lstrip": false,
218
+ "rstrip": false,
219
+ "normalized": false,
220
+ "special": true
221
+ },
222
+ {
223
+ "id": 24,
224
+ "content": "<DECREASE>",
225
+ "single_word": false,
226
+ "lstrip": false,
227
+ "rstrip": false,
228
+ "normalized": false,
229
+ "special": true
230
+ },
231
+ {
232
+ "id": 25,
233
+ "content": "<STRUCTURE>",
234
+ "single_word": false,
235
+ "lstrip": false,
236
+ "rstrip": false,
237
+ "normalized": false,
238
+ "special": true
239
+ },
240
+ {
241
+ "id": 26,
242
+ "content": "<DISTANCE>",
243
+ "single_word": false,
244
+ "lstrip": false,
245
+ "rstrip": false,
246
+ "normalized": false,
247
+ "special": true
248
+ },
249
+ {
250
+ "id": 27,
251
+ "content": "<SOLUBILITY>",
252
+ "single_word": false,
253
+ "lstrip": false,
254
+ "rstrip": false,
255
+ "normalized": false,
256
+ "special": true
257
+ },
258
+ {
259
+ "id": 28,
260
+ "content": "<TOXICITY>",
261
+ "single_word": false,
262
+ "lstrip": false,
263
+ "rstrip": false,
264
+ "normalized": false,
265
+ "special": true
266
+ },
267
+ {
268
+ "id": 29,
269
+ "content": "<AB>",
270
+ "single_word": false,
271
+ "lstrip": false,
272
+ "rstrip": false,
273
+ "normalized": false,
274
+ "special": true
275
+ },
276
+ {
277
+ "id": 30,
278
+ "content": "<ISACTIVE>",
279
+ "single_word": false,
280
+ "lstrip": false,
281
+ "rstrip": false,
282
+ "normalized": false,
283
+ "special": true
284
+ },
285
+ {
286
+ "id": 31,
287
+ "content": "<ISSYNTHETIC>",
288
+ "single_word": false,
289
+ "lstrip": false,
290
+ "rstrip": false,
291
+ "normalized": false,
292
+ "special": true
293
+ },
294
+ {
295
+ "id": 32,
296
+ "content": "<PENETR>",
297
+ "single_word": false,
298
+ "lstrip": false,
299
+ "rstrip": false,
300
+ "normalized": false,
301
+ "special": true
302
+ },
303
+ {
304
+ "id": 33,
305
+ "content": "<ABSORPTION>",
306
+ "single_word": false,
307
+ "lstrip": false,
308
+ "rstrip": false,
309
+ "normalized": false,
310
+ "special": true
311
+ },
312
+ {
313
+ "id": 34,
314
+ "content": "<DISTRIBUTION>",
315
+ "single_word": false,
316
+ "lstrip": false,
317
+ "rstrip": false,
318
+ "normalized": false,
319
+ "special": true
320
+ },
321
+ {
322
+ "id": 35,
323
+ "content": "<METABOLISM>",
324
+ "single_word": false,
325
+ "lstrip": false,
326
+ "rstrip": false,
327
+ "normalized": false,
328
+ "special": true
329
+ },
330
+ {
331
+ "id": 36,
332
+ "content": "<EXCRETION>",
333
+ "single_word": false,
334
+ "lstrip": false,
335
+ "rstrip": false,
336
+ "normalized": false,
337
+ "special": true
338
+ },
339
+ {
340
+ "id": 37,
341
+ "content": "<FLUORESCENCE>",
342
+ "single_word": false,
343
+ "lstrip": false,
344
+ "rstrip": false,
345
+ "normalized": false,
346
+ "special": true
347
+ },
348
+ {
349
+ "id": 38,
350
+ "content": "<STABILITY>",
351
+ "single_word": false,
352
+ "lstrip": false,
353
+ "rstrip": false,
354
+ "normalized": false,
355
+ "special": true
356
+ },
357
+ {
358
+ "id": 39,
359
+ "content": "<DISORDER>",
360
+ "single_word": false,
361
+ "lstrip": false,
362
+ "rstrip": false,
363
+ "normalized": false,
364
+ "special": true
365
+ },
366
+ {
367
+ "id": 40,
368
+ "content": "<DISEASE>",
369
+ "single_word": false,
370
+ "lstrip": false,
371
+ "rstrip": false,
372
+ "normalized": false,
373
+ "special": true
374
+ },
375
+ {
376
+ "id": 41,
377
+ "content": "<BINARY>",
378
+ "single_word": false,
379
+ "lstrip": false,
380
+ "rstrip": false,
381
+ "normalized": false,
382
+ "special": true
383
+ },
384
+ {
385
+ "id": 42,
386
+ "content": "<REGRESSION>",
387
+ "single_word": false,
388
+ "lstrip": false,
389
+ "rstrip": false,
390
+ "normalized": false,
391
+ "special": true
392
+ },
393
+ {
394
+ "id": 43,
395
+ "content": "<ORGANISM>",
396
+ "single_word": false,
397
+ "lstrip": false,
398
+ "rstrip": false,
399
+ "normalized": false,
400
+ "special": true
401
+ },
402
+ {
403
+ "id": 44,
404
+ "content": "<0>",
405
+ "single_word": false,
406
+ "lstrip": false,
407
+ "rstrip": false,
408
+ "normalized": false,
409
+ "special": true
410
+ },
411
+ {
412
+ "id": 45,
413
+ "content": "<1>",
414
+ "single_word": false,
415
+ "lstrip": false,
416
+ "rstrip": false,
417
+ "normalized": false,
418
+ "special": true
419
+ },
420
+ {
421
+ "id": 46,
422
+ "content": "<2>",
423
+ "single_word": false,
424
+ "lstrip": false,
425
+ "rstrip": false,
426
+ "normalized": false,
427
+ "special": true
428
+ },
429
+ {
430
+ "id": 47,
431
+ "content": "<3>",
432
+ "single_word": false,
433
+ "lstrip": false,
434
+ "rstrip": false,
435
+ "normalized": false,
436
+ "special": true
437
+ },
438
+ {
439
+ "id": 48,
440
+ "content": "<4>",
441
+ "single_word": false,
442
+ "lstrip": false,
443
+ "rstrip": false,
444
+ "normalized": false,
445
+ "special": true
446
+ },
447
+ {
448
+ "id": 49,
449
+ "content": "<5>",
450
+ "single_word": false,
451
+ "lstrip": false,
452
+ "rstrip": false,
453
+ "normalized": false,
454
+ "special": true
455
+ },
456
+ {
457
+ "id": 50,
458
+ "content": "<6>",
459
+ "single_word": false,
460
+ "lstrip": false,
461
+ "rstrip": false,
462
+ "normalized": false,
463
+ "special": true
464
+ },
465
+ {
466
+ "id": 51,
467
+ "content": "<7>",
468
+ "single_word": false,
469
+ "lstrip": false,
470
+ "rstrip": false,
471
+ "normalized": false,
472
+ "special": true
473
+ },
474
+ {
475
+ "id": 52,
476
+ "content": "<8>",
477
+ "single_word": false,
478
+ "lstrip": false,
479
+ "rstrip": false,
480
+ "normalized": false,
481
+ "special": true
482
+ },
483
+ {
484
+ "id": 53,
485
+ "content": "<9>",
486
+ "single_word": false,
487
+ "lstrip": false,
488
+ "rstrip": false,
489
+ "normalized": false,
490
+ "special": true
491
+ },
492
+ {
493
+ "id": 54,
494
+ "content": "<.>",
495
+ "single_word": false,
496
+ "lstrip": false,
497
+ "rstrip": false,
498
+ "normalized": false,
499
+ "special": true
500
+ },
501
+ {
502
+ "id": 55,
503
+ "content": "<YES>",
504
+ "single_word": false,
505
+ "lstrip": false,
506
+ "rstrip": false,
507
+ "normalized": false,
508
+ "special": true
509
+ },
510
+ {
511
+ "id": 56,
512
+ "content": "<NO>",
513
+ "single_word": false,
514
+ "lstrip": false,
515
+ "rstrip": false,
516
+ "normalized": false,
517
+ "special": true
518
+ },
519
+ {
520
+ "id": 57,
521
+ "content": "<SENTINEL_ID_0>",
522
+ "single_word": false,
523
+ "lstrip": false,
524
+ "rstrip": false,
525
+ "normalized": false,
526
+ "special": true
527
+ },
528
+ {
529
+ "id": 58,
530
+ "content": "<SENTINEL_ID_1>",
531
+ "single_word": false,
532
+ "lstrip": false,
533
+ "rstrip": false,
534
+ "normalized": false,
535
+ "special": true
536
+ },
537
+ {
538
+ "id": 59,
539
+ "content": "<SENTINEL_ID_2>",
540
+ "single_word": false,
541
+ "lstrip": false,
542
+ "rstrip": false,
543
+ "normalized": false,
544
+ "special": true
545
+ },
546
+ {
547
+ "id": 60,
548
+ "content": "<SENTINEL_ID_3>",
549
+ "single_word": false,
550
+ "lstrip": false,
551
+ "rstrip": false,
552
+ "normalized": false,
553
+ "special": true
554
+ },
555
+ {
556
+ "id": 61,
557
+ "content": "<SENTINEL_ID_4>",
558
+ "single_word": false,
559
+ "lstrip": false,
560
+ "rstrip": false,
561
+ "normalized": false,
562
+ "special": true
563
+ },
564
+ {
565
+ "id": 62,
566
+ "content": "<SENTINEL_ID_5>",
567
+ "single_word": false,
568
+ "lstrip": false,
569
+ "rstrip": false,
570
+ "normalized": false,
571
+ "special": true
572
+ },
573
+ {
574
+ "id": 63,
575
+ "content": "<SENTINEL_ID_6>",
576
+ "single_word": false,
577
+ "lstrip": false,
578
+ "rstrip": false,
579
+ "normalized": false,
580
+ "special": true
581
+ },
582
+ {
583
+ "id": 64,
584
+ "content": "<SENTINEL_ID_7>",
585
+ "single_word": false,
586
+ "lstrip": false,
587
+ "rstrip": false,
588
+ "normalized": false,
589
+ "special": true
590
+ },
591
+ {
592
+ "id": 65,
593
+ "content": "<SENTINEL_ID_8>",
594
+ "single_word": false,
595
+ "lstrip": false,
596
+ "rstrip": false,
597
+ "normalized": false,
598
+ "special": true
599
+ },
600
+ {
601
+ "id": 66,
602
+ "content": "<SENTINEL_ID_9>",
603
+ "single_word": false,
604
+ "lstrip": false,
605
+ "rstrip": false,
606
+ "normalized": false,
607
+ "special": true
608
+ },
609
+ {
610
+ "id": 67,
611
+ "content": "<SENTINEL_ID_10>",
612
+ "single_word": false,
613
+ "lstrip": false,
614
+ "rstrip": false,
615
+ "normalized": false,
616
+ "special": true
617
+ },
618
+ {
619
+ "id": 68,
620
+ "content": "<SENTINEL_ID_11>",
621
+ "single_word": false,
622
+ "lstrip": false,
623
+ "rstrip": false,
624
+ "normalized": false,
625
+ "special": true
626
+ },
627
+ {
628
+ "id": 69,
629
+ "content": "<SENTINEL_ID_12>",
630
+ "single_word": false,
631
+ "lstrip": false,
632
+ "rstrip": false,
633
+ "normalized": false,
634
+ "special": true
635
+ },
636
+ {
637
+ "id": 70,
638
+ "content": "<SENTINEL_ID_13>",
639
+ "single_word": false,
640
+ "lstrip": false,
641
+ "rstrip": false,
642
+ "normalized": false,
643
+ "special": true
644
+ },
645
+ {
646
+ "id": 71,
647
+ "content": "<SENTINEL_ID_14>",
648
+ "single_word": false,
649
+ "lstrip": false,
650
+ "rstrip": false,
651
+ "normalized": false,
652
+ "special": true
653
+ },
654
+ {
655
+ "id": 72,
656
+ "content": "<SENTINEL_ID_15>",
657
+ "single_word": false,
658
+ "lstrip": false,
659
+ "rstrip": false,
660
+ "normalized": false,
661
+ "special": true
662
+ },
663
+ {
664
+ "id": 73,
665
+ "content": "<SENTINEL_ID_16>",
666
+ "single_word": false,
667
+ "lstrip": false,
668
+ "rstrip": false,
669
+ "normalized": false,
670
+ "special": true
671
+ },
672
+ {
673
+ "id": 74,
674
+ "content": "<SENTINEL_ID_17>",
675
+ "single_word": false,
676
+ "lstrip": false,
677
+ "rstrip": false,
678
+ "normalized": false,
679
+ "special": true
680
+ },
681
+ {
682
+ "id": 75,
683
+ "content": "<SENTINEL_ID_18>",
684
+ "single_word": false,
685
+ "lstrip": false,
686
+ "rstrip": false,
687
+ "normalized": false,
688
+ "special": true
689
+ },
690
+ {
691
+ "id": 76,
692
+ "content": "<SENTINEL_ID_19>",
693
+ "single_word": false,
694
+ "lstrip": false,
695
+ "rstrip": false,
696
+ "normalized": false,
697
+ "special": true
698
+ },
699
+ {
700
+ "id": 77,
701
+ "content": "<SENTINEL_ID_20>",
702
+ "single_word": false,
703
+ "lstrip": false,
704
+ "rstrip": false,
705
+ "normalized": false,
706
+ "special": true
707
+ },
708
+ {
709
+ "id": 78,
710
+ "content": "<SENTINEL_ID_21>",
711
+ "single_word": false,
712
+ "lstrip": false,
713
+ "rstrip": false,
714
+ "normalized": false,
715
+ "special": true
716
+ },
717
+ {
718
+ "id": 79,
719
+ "content": "<SENTINEL_ID_22>",
720
+ "single_word": false,
721
+ "lstrip": false,
722
+ "rstrip": false,
723
+ "normalized": false,
724
+ "special": true
725
+ },
726
+ {
727
+ "id": 80,
728
+ "content": "<SENTINEL_ID_23>",
729
+ "single_word": false,
730
+ "lstrip": false,
731
+ "rstrip": false,
732
+ "normalized": false,
733
+ "special": true
734
+ },
735
+ {
736
+ "id": 81,
737
+ "content": "<SENTINEL_ID_24>",
738
+ "single_word": false,
739
+ "lstrip": false,
740
+ "rstrip": false,
741
+ "normalized": false,
742
+ "special": true
743
+ },
744
+ {
745
+ "id": 82,
746
+ "content": "<SENTINEL_ID_25>",
747
+ "single_word": false,
748
+ "lstrip": false,
749
+ "rstrip": false,
750
+ "normalized": false,
751
+ "special": true
752
+ },
753
+ {
754
+ "id": 83,
755
+ "content": "<SENTINEL_ID_26>",
756
+ "single_word": false,
757
+ "lstrip": false,
758
+ "rstrip": false,
759
+ "normalized": false,
760
+ "special": true
761
+ },
762
+ {
763
+ "id": 84,
764
+ "content": "<SENTINEL_ID_27>",
765
+ "single_word": false,
766
+ "lstrip": false,
767
+ "rstrip": false,
768
+ "normalized": false,
769
+ "special": true
770
+ },
771
+ {
772
+ "id": 85,
773
+ "content": "<SENTINEL_ID_28>",
774
+ "single_word": false,
775
+ "lstrip": false,
776
+ "rstrip": false,
777
+ "normalized": false,
778
+ "special": true
779
+ },
780
+ {
781
+ "id": 86,
782
+ "content": "<SENTINEL_ID_29>",
783
+ "single_word": false,
784
+ "lstrip": false,
785
+ "rstrip": false,
786
+ "normalized": false,
787
+ "special": true
788
+ },
789
+ {
790
+ "id": 87,
791
+ "content": "<SENTINEL_ID_30>",
792
+ "single_word": false,
793
+ "lstrip": false,
794
+ "rstrip": false,
795
+ "normalized": false,
796
+ "special": true
797
+ },
798
+ {
799
+ "id": 88,
800
+ "content": "<SENTINEL_ID_31>",
801
+ "single_word": false,
802
+ "lstrip": false,
803
+ "rstrip": false,
804
+ "normalized": false,
805
+ "special": true
806
+ },
807
+ {
808
+ "id": 89,
809
+ "content": "<SENTINEL_ID_32>",
810
+ "single_word": false,
811
+ "lstrip": false,
812
+ "rstrip": false,
813
+ "normalized": false,
814
+ "special": true
815
+ },
816
+ {
817
+ "id": 90,
818
+ "content": "<SENTINEL_ID_33>",
819
+ "single_word": false,
820
+ "lstrip": false,
821
+ "rstrip": false,
822
+ "normalized": false,
823
+ "special": true
824
+ },
825
+ {
826
+ "id": 91,
827
+ "content": "<SENTINEL_ID_34>",
828
+ "single_word": false,
829
+ "lstrip": false,
830
+ "rstrip": false,
831
+ "normalized": false,
832
+ "special": true
833
+ },
834
+ {
835
+ "id": 92,
836
+ "content": "<SENTINEL_ID_35>",
837
+ "single_word": false,
838
+ "lstrip": false,
839
+ "rstrip": false,
840
+ "normalized": false,
841
+ "special": true
842
+ },
843
+ {
844
+ "id": 93,
845
+ "content": "<SENTINEL_ID_36>",
846
+ "single_word": false,
847
+ "lstrip": false,
848
+ "rstrip": false,
849
+ "normalized": false,
850
+ "special": true
851
+ },
852
+ {
853
+ "id": 94,
854
+ "content": "<SENTINEL_ID_37>",
855
+ "single_word": false,
856
+ "lstrip": false,
857
+ "rstrip": false,
858
+ "normalized": false,
859
+ "special": true
860
+ },
861
+ {
862
+ "id": 95,
863
+ "content": "<SENTINEL_ID_38>",
864
+ "single_word": false,
865
+ "lstrip": false,
866
+ "rstrip": false,
867
+ "normalized": false,
868
+ "special": true
869
+ },
870
+ {
871
+ "id": 96,
872
+ "content": "<SENTINEL_ID_39>",
873
+ "single_word": false,
874
+ "lstrip": false,
875
+ "rstrip": false,
876
+ "normalized": false,
877
+ "special": true
878
+ },
879
+ {
880
+ "id": 97,
881
+ "content": "<SENTINEL_ID_40>",
882
+ "single_word": false,
883
+ "lstrip": false,
884
+ "rstrip": false,
885
+ "normalized": false,
886
+ "special": true
887
+ },
888
+ {
889
+ "id": 98,
890
+ "content": "<SENTINEL_ID_41>",
891
+ "single_word": false,
892
+ "lstrip": false,
893
+ "rstrip": false,
894
+ "normalized": false,
895
+ "special": true
896
+ },
897
+ {
898
+ "id": 99,
899
+ "content": "<SENTINEL_ID_42>",
900
+ "single_word": false,
901
+ "lstrip": false,
902
+ "rstrip": false,
903
+ "normalized": false,
904
+ "special": true
905
+ },
906
+ {
907
+ "id": 100,
908
+ "content": "<SENTINEL_ID_43>",
909
+ "single_word": false,
910
+ "lstrip": false,
911
+ "rstrip": false,
912
+ "normalized": false,
913
+ "special": true
914
+ },
915
+ {
916
+ "id": 101,
917
+ "content": "<SENTINEL_ID_44>",
918
+ "single_word": false,
919
+ "lstrip": false,
920
+ "rstrip": false,
921
+ "normalized": false,
922
+ "special": true
923
+ },
924
+ {
925
+ "id": 102,
926
+ "content": "<SENTINEL_ID_45>",
927
+ "single_word": false,
928
+ "lstrip": false,
929
+ "rstrip": false,
930
+ "normalized": false,
931
+ "special": true
932
+ },
933
+ {
934
+ "id": 103,
935
+ "content": "<SENTINEL_ID_46>",
936
+ "single_word": false,
937
+ "lstrip": false,
938
+ "rstrip": false,
939
+ "normalized": false,
940
+ "special": true
941
+ },
942
+ {
943
+ "id": 104,
944
+ "content": "<SENTINEL_ID_47>",
945
+ "single_word": false,
946
+ "lstrip": false,
947
+ "rstrip": false,
948
+ "normalized": false,
949
+ "special": true
950
+ },
951
+ {
952
+ "id": 105,
953
+ "content": "<SENTINEL_ID_48>",
954
+ "single_word": false,
955
+ "lstrip": false,
956
+ "rstrip": false,
957
+ "normalized": false,
958
+ "special": true
959
+ },
960
+ {
961
+ "id": 106,
962
+ "content": "<SENTINEL_ID_49>",
963
+ "single_word": false,
964
+ "lstrip": false,
965
+ "rstrip": false,
966
+ "normalized": false,
967
+ "special": true
968
+ },
969
+ {
970
+ "id": 107,
971
+ "content": "<SENTINEL_ID_50>",
972
+ "single_word": false,
973
+ "lstrip": false,
974
+ "rstrip": false,
975
+ "normalized": false,
976
+ "special": true
977
+ },
978
+ {
979
+ "id": 108,
980
+ "content": "<SENTINEL_ID_51>",
981
+ "single_word": false,
982
+ "lstrip": false,
983
+ "rstrip": false,
984
+ "normalized": false,
985
+ "special": true
986
+ },
987
+ {
988
+ "id": 109,
989
+ "content": "<SENTINEL_ID_52>",
990
+ "single_word": false,
991
+ "lstrip": false,
992
+ "rstrip": false,
993
+ "normalized": false,
994
+ "special": true
995
+ },
996
+ {
997
+ "id": 110,
998
+ "content": "<SENTINEL_ID_53>",
999
+ "single_word": false,
1000
+ "lstrip": false,
1001
+ "rstrip": false,
1002
+ "normalized": false,
1003
+ "special": true
1004
+ },
1005
+ {
1006
+ "id": 111,
1007
+ "content": "<SENTINEL_ID_54>",
1008
+ "single_word": false,
1009
+ "lstrip": false,
1010
+ "rstrip": false,
1011
+ "normalized": false,
1012
+ "special": true
1013
+ },
1014
+ {
1015
+ "id": 112,
1016
+ "content": "<SENTINEL_ID_55>",
1017
+ "single_word": false,
1018
+ "lstrip": false,
1019
+ "rstrip": false,
1020
+ "normalized": false,
1021
+ "special": true
1022
+ },
1023
+ {
1024
+ "id": 113,
1025
+ "content": "<SENTINEL_ID_56>",
1026
+ "single_word": false,
1027
+ "lstrip": false,
1028
+ "rstrip": false,
1029
+ "normalized": false,
1030
+ "special": true
1031
+ },
1032
+ {
1033
+ "id": 114,
1034
+ "content": "<SENTINEL_ID_57>",
1035
+ "single_word": false,
1036
+ "lstrip": false,
1037
+ "rstrip": false,
1038
+ "normalized": false,
1039
+ "special": true
1040
+ },
1041
+ {
1042
+ "id": 115,
1043
+ "content": "<SENTINEL_ID_58>",
1044
+ "single_word": false,
1045
+ "lstrip": false,
1046
+ "rstrip": false,
1047
+ "normalized": false,
1048
+ "special": true
1049
+ },
1050
+ {
1051
+ "id": 116,
1052
+ "content": "<SENTINEL_ID_59>",
1053
+ "single_word": false,
1054
+ "lstrip": false,
1055
+ "rstrip": false,
1056
+ "normalized": false,
1057
+ "special": true
1058
+ },
1059
+ {
1060
+ "id": 117,
1061
+ "content": "<SENTINEL_ID_60>",
1062
+ "single_word": false,
1063
+ "lstrip": false,
1064
+ "rstrip": false,
1065
+ "normalized": false,
1066
+ "special": true
1067
+ },
1068
+ {
1069
+ "id": 118,
1070
+ "content": "<SENTINEL_ID_61>",
1071
+ "single_word": false,
1072
+ "lstrip": false,
1073
+ "rstrip": false,
1074
+ "normalized": false,
1075
+ "special": true
1076
+ },
1077
+ {
1078
+ "id": 119,
1079
+ "content": "<SENTINEL_ID_62>",
1080
+ "single_word": false,
1081
+ "lstrip": false,
1082
+ "rstrip": false,
1083
+ "normalized": false,
1084
+ "special": true
1085
+ },
1086
+ {
1087
+ "id": 120,
1088
+ "content": "<SENTINEL_ID_63>",
1089
+ "single_word": false,
1090
+ "lstrip": false,
1091
+ "rstrip": false,
1092
+ "normalized": false,
1093
+ "special": true
1094
+ },
1095
+ {
1096
+ "id": 121,
1097
+ "content": "<SENTINEL_ID_64>",
1098
+ "single_word": false,
1099
+ "lstrip": false,
1100
+ "rstrip": false,
1101
+ "normalized": false,
1102
+ "special": true
1103
+ },
1104
+ {
1105
+ "id": 122,
1106
+ "content": "<SENTINEL_ID_65>",
1107
+ "single_word": false,
1108
+ "lstrip": false,
1109
+ "rstrip": false,
1110
+ "normalized": false,
1111
+ "special": true
1112
+ },
1113
+ {
1114
+ "id": 123,
1115
+ "content": "<SENTINEL_ID_66>",
1116
+ "single_word": false,
1117
+ "lstrip": false,
1118
+ "rstrip": false,
1119
+ "normalized": false,
1120
+ "special": true
1121
+ },
1122
+ {
1123
+ "id": 124,
1124
+ "content": "<SENTINEL_ID_67>",
1125
+ "single_word": false,
1126
+ "lstrip": false,
1127
+ "rstrip": false,
1128
+ "normalized": false,
1129
+ "special": true
1130
+ },
1131
+ {
1132
+ "id": 125,
1133
+ "content": "<SENTINEL_ID_68>",
1134
+ "single_word": false,
1135
+ "lstrip": false,
1136
+ "rstrip": false,
1137
+ "normalized": false,
1138
+ "special": true
1139
+ },
1140
+ {
1141
+ "id": 126,
1142
+ "content": "<SENTINEL_ID_69>",
1143
+ "single_word": false,
1144
+ "lstrip": false,
1145
+ "rstrip": false,
1146
+ "normalized": false,
1147
+ "special": true
1148
+ },
1149
+ {
1150
+ "id": 127,
1151
+ "content": "<SENTINEL_ID_70>",
1152
+ "single_word": false,
1153
+ "lstrip": false,
1154
+ "rstrip": false,
1155
+ "normalized": false,
1156
+ "special": true
1157
+ },
1158
+ {
1159
+ "id": 128,
1160
+ "content": "<SENTINEL_ID_71>",
1161
+ "single_word": false,
1162
+ "lstrip": false,
1163
+ "rstrip": false,
1164
+ "normalized": false,
1165
+ "special": true
1166
+ },
1167
+ {
1168
+ "id": 129,
1169
+ "content": "<SENTINEL_ID_72>",
1170
+ "single_word": false,
1171
+ "lstrip": false,
1172
+ "rstrip": false,
1173
+ "normalized": false,
1174
+ "special": true
1175
+ },
1176
+ {
1177
+ "id": 130,
1178
+ "content": "<SENTINEL_ID_73>",
1179
+ "single_word": false,
1180
+ "lstrip": false,
1181
+ "rstrip": false,
1182
+ "normalized": false,
1183
+ "special": true
1184
+ },
1185
+ {
1186
+ "id": 131,
1187
+ "content": "<SENTINEL_ID_74>",
1188
+ "single_word": false,
1189
+ "lstrip": false,
1190
+ "rstrip": false,
1191
+ "normalized": false,
1192
+ "special": true
1193
+ },
1194
+ {
1195
+ "id": 132,
1196
+ "content": "<SENTINEL_ID_75>",
1197
+ "single_word": false,
1198
+ "lstrip": false,
1199
+ "rstrip": false,
1200
+ "normalized": false,
1201
+ "special": true
1202
+ },
1203
+ {
1204
+ "id": 133,
1205
+ "content": "<SENTINEL_ID_76>",
1206
+ "single_word": false,
1207
+ "lstrip": false,
1208
+ "rstrip": false,
1209
+ "normalized": false,
1210
+ "special": true
1211
+ },
1212
+ {
1213
+ "id": 134,
1214
+ "content": "<SENTINEL_ID_77>",
1215
+ "single_word": false,
1216
+ "lstrip": false,
1217
+ "rstrip": false,
1218
+ "normalized": false,
1219
+ "special": true
1220
+ },
1221
+ {
1222
+ "id": 135,
1223
+ "content": "<SENTINEL_ID_78>",
1224
+ "single_word": false,
1225
+ "lstrip": false,
1226
+ "rstrip": false,
1227
+ "normalized": false,
1228
+ "special": true
1229
+ },
1230
+ {
1231
+ "id": 136,
1232
+ "content": "<SENTINEL_ID_79>",
1233
+ "single_word": false,
1234
+ "lstrip": false,
1235
+ "rstrip": false,
1236
+ "normalized": false,
1237
+ "special": true
1238
+ },
1239
+ {
1240
+ "id": 137,
1241
+ "content": "<SENTINEL_ID_80>",
1242
+ "single_word": false,
1243
+ "lstrip": false,
1244
+ "rstrip": false,
1245
+ "normalized": false,
1246
+ "special": true
1247
+ },
1248
+ {
1249
+ "id": 138,
1250
+ "content": "<SENTINEL_ID_81>",
1251
+ "single_word": false,
1252
+ "lstrip": false,
1253
+ "rstrip": false,
1254
+ "normalized": false,
1255
+ "special": true
1256
+ },
1257
+ {
1258
+ "id": 139,
1259
+ "content": "<SENTINEL_ID_82>",
1260
+ "single_word": false,
1261
+ "lstrip": false,
1262
+ "rstrip": false,
1263
+ "normalized": false,
1264
+ "special": true
1265
+ },
1266
+ {
1267
+ "id": 140,
1268
+ "content": "<SENTINEL_ID_83>",
1269
+ "single_word": false,
1270
+ "lstrip": false,
1271
+ "rstrip": false,
1272
+ "normalized": false,
1273
+ "special": true
1274
+ },
1275
+ {
1276
+ "id": 141,
1277
+ "content": "<SENTINEL_ID_84>",
1278
+ "single_word": false,
1279
+ "lstrip": false,
1280
+ "rstrip": false,
1281
+ "normalized": false,
1282
+ "special": true
1283
+ },
1284
+ {
1285
+ "id": 142,
1286
+ "content": "<SENTINEL_ID_85>",
1287
+ "single_word": false,
1288
+ "lstrip": false,
1289
+ "rstrip": false,
1290
+ "normalized": false,
1291
+ "special": true
1292
+ },
1293
+ {
1294
+ "id": 143,
1295
+ "content": "<SENTINEL_ID_86>",
1296
+ "single_word": false,
1297
+ "lstrip": false,
1298
+ "rstrip": false,
1299
+ "normalized": false,
1300
+ "special": true
1301
+ },
1302
+ {
1303
+ "id": 144,
1304
+ "content": "<SENTINEL_ID_87>",
1305
+ "single_word": false,
1306
+ "lstrip": false,
1307
+ "rstrip": false,
1308
+ "normalized": false,
1309
+ "special": true
1310
+ },
1311
+ {
1312
+ "id": 145,
1313
+ "content": "<SENTINEL_ID_88>",
1314
+ "single_word": false,
1315
+ "lstrip": false,
1316
+ "rstrip": false,
1317
+ "normalized": false,
1318
+ "special": true
1319
+ },
1320
+ {
1321
+ "id": 146,
1322
+ "content": "<SENTINEL_ID_89>",
1323
+ "single_word": false,
1324
+ "lstrip": false,
1325
+ "rstrip": false,
1326
+ "normalized": false,
1327
+ "special": true
1328
+ },
1329
+ {
1330
+ "id": 147,
1331
+ "content": "<SENTINEL_ID_90>",
1332
+ "single_word": false,
1333
+ "lstrip": false,
1334
+ "rstrip": false,
1335
+ "normalized": false,
1336
+ "special": true
1337
+ },
1338
+ {
1339
+ "id": 148,
1340
+ "content": "<SENTINEL_ID_91>",
1341
+ "single_word": false,
1342
+ "lstrip": false,
1343
+ "rstrip": false,
1344
+ "normalized": false,
1345
+ "special": true
1346
+ },
1347
+ {
1348
+ "id": 149,
1349
+ "content": "<SENTINEL_ID_92>",
1350
+ "single_word": false,
1351
+ "lstrip": false,
1352
+ "rstrip": false,
1353
+ "normalized": false,
1354
+ "special": true
1355
+ },
1356
+ {
1357
+ "id": 150,
1358
+ "content": "<SENTINEL_ID_93>",
1359
+ "single_word": false,
1360
+ "lstrip": false,
1361
+ "rstrip": false,
1362
+ "normalized": false,
1363
+ "special": true
1364
+ },
1365
+ {
1366
+ "id": 151,
1367
+ "content": "<SENTINEL_ID_94>",
1368
+ "single_word": false,
1369
+ "lstrip": false,
1370
+ "rstrip": false,
1371
+ "normalized": false,
1372
+ "special": true
1373
+ },
1374
+ {
1375
+ "id": 152,
1376
+ "content": "<SENTINEL_ID_95>",
1377
+ "single_word": false,
1378
+ "lstrip": false,
1379
+ "rstrip": false,
1380
+ "normalized": false,
1381
+ "special": true
1382
+ },
1383
+ {
1384
+ "id": 153,
1385
+ "content": "<SENTINEL_ID_96>",
1386
+ "single_word": false,
1387
+ "lstrip": false,
1388
+ "rstrip": false,
1389
+ "normalized": false,
1390
+ "special": true
1391
+ },
1392
+ {
1393
+ "id": 154,
1394
+ "content": "<SENTINEL_ID_97>",
1395
+ "single_word": false,
1396
+ "lstrip": false,
1397
+ "rstrip": false,
1398
+ "normalized": false,
1399
+ "special": true
1400
+ },
1401
+ {
1402
+ "id": 155,
1403
+ "content": "<SENTINEL_ID_98>",
1404
+ "single_word": false,
1405
+ "lstrip": false,
1406
+ "rstrip": false,
1407
+ "normalized": false,
1408
+ "special": true
1409
+ },
1410
+ {
1411
+ "id": 156,
1412
+ "content": "<SENTINEL_ID_99>",
1413
+ "single_word": false,
1414
+ "lstrip": false,
1415
+ "rstrip": false,
1416
+ "normalized": false,
1417
+ "special": true
1418
+ },
1419
+ {
1420
+ "id": 157,
1421
+ "content": "<SENTINEL_ID_100>",
1422
+ "single_word": false,
1423
+ "lstrip": false,
1424
+ "rstrip": false,
1425
+ "normalized": false,
1426
+ "special": true
1427
+ },
1428
+ {
1429
+ "id": 158,
1430
+ "content": "<SENTINEL_ID_101>",
1431
+ "single_word": false,
1432
+ "lstrip": false,
1433
+ "rstrip": false,
1434
+ "normalized": false,
1435
+ "special": true
1436
+ },
1437
+ {
1438
+ "id": 159,
1439
+ "content": "<SENTINEL_ID_102>",
1440
+ "single_word": false,
1441
+ "lstrip": false,
1442
+ "rstrip": false,
1443
+ "normalized": false,
1444
+ "special": true
1445
+ },
1446
+ {
1447
+ "id": 160,
1448
+ "content": "<SENTINEL_ID_103>",
1449
+ "single_word": false,
1450
+ "lstrip": false,
1451
+ "rstrip": false,
1452
+ "normalized": false,
1453
+ "special": true
1454
+ },
1455
+ {
1456
+ "id": 161,
1457
+ "content": "<SENTINEL_ID_104>",
1458
+ "single_word": false,
1459
+ "lstrip": false,
1460
+ "rstrip": false,
1461
+ "normalized": false,
1462
+ "special": true
1463
+ },
1464
+ {
1465
+ "id": 162,
1466
+ "content": "<SENTINEL_ID_105>",
1467
+ "single_word": false,
1468
+ "lstrip": false,
1469
+ "rstrip": false,
1470
+ "normalized": false,
1471
+ "special": true
1472
+ },
1473
+ {
1474
+ "id": 163,
1475
+ "content": "<SENTINEL_ID_106>",
1476
+ "single_word": false,
1477
+ "lstrip": false,
1478
+ "rstrip": false,
1479
+ "normalized": false,
1480
+ "special": true
1481
+ },
1482
+ {
1483
+ "id": 164,
1484
+ "content": "<SENTINEL_ID_107>",
1485
+ "single_word": false,
1486
+ "lstrip": false,
1487
+ "rstrip": false,
1488
+ "normalized": false,
1489
+ "special": true
1490
+ },
1491
+ {
1492
+ "id": 165,
1493
+ "content": "<SENTINEL_ID_108>",
1494
+ "single_word": false,
1495
+ "lstrip": false,
1496
+ "rstrip": false,
1497
+ "normalized": false,
1498
+ "special": true
1499
+ },
1500
+ {
1501
+ "id": 166,
1502
+ "content": "<SENTINEL_ID_109>",
1503
+ "single_word": false,
1504
+ "lstrip": false,
1505
+ "rstrip": false,
1506
+ "normalized": false,
1507
+ "special": true
1508
+ },
1509
+ {
1510
+ "id": 167,
1511
+ "content": "<SENTINEL_ID_110>",
1512
+ "single_word": false,
1513
+ "lstrip": false,
1514
+ "rstrip": false,
1515
+ "normalized": false,
1516
+ "special": true
1517
+ },
1518
+ {
1519
+ "id": 168,
1520
+ "content": "<SENTINEL_ID_111>",
1521
+ "single_word": false,
1522
+ "lstrip": false,
1523
+ "rstrip": false,
1524
+ "normalized": false,
1525
+ "special": true
1526
+ },
1527
+ {
1528
+ "id": 169,
1529
+ "content": "<SENTINEL_ID_112>",
1530
+ "single_word": false,
1531
+ "lstrip": false,
1532
+ "rstrip": false,
1533
+ "normalized": false,
1534
+ "special": true
1535
+ },
1536
+ {
1537
+ "id": 170,
1538
+ "content": "<SENTINEL_ID_113>",
1539
+ "single_word": false,
1540
+ "lstrip": false,
1541
+ "rstrip": false,
1542
+ "normalized": false,
1543
+ "special": true
1544
+ },
1545
+ {
1546
+ "id": 171,
1547
+ "content": "<SENTINEL_ID_114>",
1548
+ "single_word": false,
1549
+ "lstrip": false,
1550
+ "rstrip": false,
1551
+ "normalized": false,
1552
+ "special": true
1553
+ },
1554
+ {
1555
+ "id": 172,
1556
+ "content": "<SENTINEL_ID_115>",
1557
+ "single_word": false,
1558
+ "lstrip": false,
1559
+ "rstrip": false,
1560
+ "normalized": false,
1561
+ "special": true
1562
+ },
1563
+ {
1564
+ "id": 173,
1565
+ "content": "<SENTINEL_ID_116>",
1566
+ "single_word": false,
1567
+ "lstrip": false,
1568
+ "rstrip": false,
1569
+ "normalized": false,
1570
+ "special": true
1571
+ },
1572
+ {
1573
+ "id": 174,
1574
+ "content": "<SENTINEL_ID_117>",
1575
+ "single_word": false,
1576
+ "lstrip": false,
1577
+ "rstrip": false,
1578
+ "normalized": false,
1579
+ "special": true
1580
+ },
1581
+ {
1582
+ "id": 175,
1583
+ "content": "<SENTINEL_ID_118>",
1584
+ "single_word": false,
1585
+ "lstrip": false,
1586
+ "rstrip": false,
1587
+ "normalized": false,
1588
+ "special": true
1589
+ },
1590
+ {
1591
+ "id": 176,
1592
+ "content": "<SENTINEL_ID_119>",
1593
+ "single_word": false,
1594
+ "lstrip": false,
1595
+ "rstrip": false,
1596
+ "normalized": false,
1597
+ "special": true
1598
+ },
1599
+ {
1600
+ "id": 177,
1601
+ "content": "<SENTINEL_ID_120>",
1602
+ "single_word": false,
1603
+ "lstrip": false,
1604
+ "rstrip": false,
1605
+ "normalized": false,
1606
+ "special": true
1607
+ },
1608
+ {
1609
+ "id": 178,
1610
+ "content": "<SENTINEL_ID_121>",
1611
+ "single_word": false,
1612
+ "lstrip": false,
1613
+ "rstrip": false,
1614
+ "normalized": false,
1615
+ "special": true
1616
+ },
1617
+ {
1618
+ "id": 179,
1619
+ "content": "<SENTINEL_ID_122>",
1620
+ "single_word": false,
1621
+ "lstrip": false,
1622
+ "rstrip": false,
1623
+ "normalized": false,
1624
+ "special": true
1625
+ },
1626
+ {
1627
+ "id": 180,
1628
+ "content": "<SENTINEL_ID_123>",
1629
+ "single_word": false,
1630
+ "lstrip": false,
1631
+ "rstrip": false,
1632
+ "normalized": false,
1633
+ "special": true
1634
+ },
1635
+ {
1636
+ "id": 181,
1637
+ "content": "<SENTINEL_ID_124>",
1638
+ "single_word": false,
1639
+ "lstrip": false,
1640
+ "rstrip": false,
1641
+ "normalized": false,
1642
+ "special": true
1643
+ },
1644
+ {
1645
+ "id": 182,
1646
+ "content": "<SENTINEL_ID_125>",
1647
+ "single_word": false,
1648
+ "lstrip": false,
1649
+ "rstrip": false,
1650
+ "normalized": false,
1651
+ "special": true
1652
+ },
1653
+ {
1654
+ "id": 183,
1655
+ "content": "<SENTINEL_ID_126>",
1656
+ "single_word": false,
1657
+ "lstrip": false,
1658
+ "rstrip": false,
1659
+ "normalized": false,
1660
+ "special": true
1661
+ },
1662
+ {
1663
+ "id": 184,
1664
+ "content": "<SENTINEL_ID_127>",
1665
+ "single_word": false,
1666
+ "lstrip": false,
1667
+ "rstrip": false,
1668
+ "normalized": false,
1669
+ "special": true
1670
+ },
1671
+ {
1672
+ "id": 185,
1673
+ "content": "<SENTINEL_ID_128>",
1674
+ "single_word": false,
1675
+ "lstrip": false,
1676
+ "rstrip": false,
1677
+ "normalized": false,
1678
+ "special": true
1679
+ },
1680
+ {
1681
+ "id": 186,
1682
+ "content": "<SENTINEL_ID_129>",
1683
+ "single_word": false,
1684
+ "lstrip": false,
1685
+ "rstrip": false,
1686
+ "normalized": false,
1687
+ "special": true
1688
+ },
1689
+ {
1690
+ "id": 187,
1691
+ "content": "<SENTINEL_ID_130>",
1692
+ "single_word": false,
1693
+ "lstrip": false,
1694
+ "rstrip": false,
1695
+ "normalized": false,
1696
+ "special": true
1697
+ },
1698
+ {
1699
+ "id": 188,
1700
+ "content": "<SENTINEL_ID_131>",
1701
+ "single_word": false,
1702
+ "lstrip": false,
1703
+ "rstrip": false,
1704
+ "normalized": false,
1705
+ "special": true
1706
+ },
1707
+ {
1708
+ "id": 189,
1709
+ "content": "<SENTINEL_ID_132>",
1710
+ "single_word": false,
1711
+ "lstrip": false,
1712
+ "rstrip": false,
1713
+ "normalized": false,
1714
+ "special": true
1715
+ },
1716
+ {
1717
+ "id": 190,
1718
+ "content": "<SENTINEL_ID_133>",
1719
+ "single_word": false,
1720
+ "lstrip": false,
1721
+ "rstrip": false,
1722
+ "normalized": false,
1723
+ "special": true
1724
+ },
1725
+ {
1726
+ "id": 191,
1727
+ "content": "<SENTINEL_ID_134>",
1728
+ "single_word": false,
1729
+ "lstrip": false,
1730
+ "rstrip": false,
1731
+ "normalized": false,
1732
+ "special": true
1733
+ },
1734
+ {
1735
+ "id": 192,
1736
+ "content": "<SENTINEL_ID_135>",
1737
+ "single_word": false,
1738
+ "lstrip": false,
1739
+ "rstrip": false,
1740
+ "normalized": false,
1741
+ "special": true
1742
+ },
1743
+ {
1744
+ "id": 193,
1745
+ "content": "<SENTINEL_ID_136>",
1746
+ "single_word": false,
1747
+ "lstrip": false,
1748
+ "rstrip": false,
1749
+ "normalized": false,
1750
+ "special": true
1751
+ },
1752
+ {
1753
+ "id": 194,
1754
+ "content": "<SENTINEL_ID_137>",
1755
+ "single_word": false,
1756
+ "lstrip": false,
1757
+ "rstrip": false,
1758
+ "normalized": false,
1759
+ "special": true
1760
+ },
1761
+ {
1762
+ "id": 195,
1763
+ "content": "<SENTINEL_ID_138>",
1764
+ "single_word": false,
1765
+ "lstrip": false,
1766
+ "rstrip": false,
1767
+ "normalized": false,
1768
+ "special": true
1769
+ },
1770
+ {
1771
+ "id": 196,
1772
+ "content": "<SENTINEL_ID_139>",
1773
+ "single_word": false,
1774
+ "lstrip": false,
1775
+ "rstrip": false,
1776
+ "normalized": false,
1777
+ "special": true
1778
+ },
1779
+ {
1780
+ "id": 197,
1781
+ "content": "<SENTINEL_ID_140>",
1782
+ "single_word": false,
1783
+ "lstrip": false,
1784
+ "rstrip": false,
1785
+ "normalized": false,
1786
+ "special": true
1787
+ },
1788
+ {
1789
+ "id": 198,
1790
+ "content": "<SENTINEL_ID_141>",
1791
+ "single_word": false,
1792
+ "lstrip": false,
1793
+ "rstrip": false,
1794
+ "normalized": false,
1795
+ "special": true
1796
+ },
1797
+ {
1798
+ "id": 199,
1799
+ "content": "<SENTINEL_ID_142>",
1800
+ "single_word": false,
1801
+ "lstrip": false,
1802
+ "rstrip": false,
1803
+ "normalized": false,
1804
+ "special": true
1805
+ },
1806
+ {
1807
+ "id": 200,
1808
+ "content": "<SENTINEL_ID_143>",
1809
+ "single_word": false,
1810
+ "lstrip": false,
1811
+ "rstrip": false,
1812
+ "normalized": false,
1813
+ "special": true
1814
+ },
1815
+ {
1816
+ "id": 201,
1817
+ "content": "<SENTINEL_ID_144>",
1818
+ "single_word": false,
1819
+ "lstrip": false,
1820
+ "rstrip": false,
1821
+ "normalized": false,
1822
+ "special": true
1823
+ },
1824
+ {
1825
+ "id": 202,
1826
+ "content": "<SENTINEL_ID_145>",
1827
+ "single_word": false,
1828
+ "lstrip": false,
1829
+ "rstrip": false,
1830
+ "normalized": false,
1831
+ "special": true
1832
+ },
1833
+ {
1834
+ "id": 203,
1835
+ "content": "<SENTINEL_ID_146>",
1836
+ "single_word": false,
1837
+ "lstrip": false,
1838
+ "rstrip": false,
1839
+ "normalized": false,
1840
+ "special": true
1841
+ },
1842
+ {
1843
+ "id": 204,
1844
+ "content": "<SENTINEL_ID_147>",
1845
+ "single_word": false,
1846
+ "lstrip": false,
1847
+ "rstrip": false,
1848
+ "normalized": false,
1849
+ "special": true
1850
+ },
1851
+ {
1852
+ "id": 205,
1853
+ "content": "<SENTINEL_ID_148>",
1854
+ "single_word": false,
1855
+ "lstrip": false,
1856
+ "rstrip": false,
1857
+ "normalized": false,
1858
+ "special": true
1859
+ },
1860
+ {
1861
+ "id": 206,
1862
+ "content": "<SENTINEL_ID_149>",
1863
+ "single_word": false,
1864
+ "lstrip": false,
1865
+ "rstrip": false,
1866
+ "normalized": false,
1867
+ "special": true
1868
+ },
1869
+ {
1870
+ "id": 207,
1871
+ "content": "<SENTINEL_ID_150>",
1872
+ "single_word": false,
1873
+ "lstrip": false,
1874
+ "rstrip": false,
1875
+ "normalized": false,
1876
+ "special": true
1877
+ },
1878
+ {
1879
+ "id": 208,
1880
+ "content": "<SENTINEL_ID_151>",
1881
+ "single_word": false,
1882
+ "lstrip": false,
1883
+ "rstrip": false,
1884
+ "normalized": false,
1885
+ "special": true
1886
+ },
1887
+ {
1888
+ "id": 209,
1889
+ "content": "<SENTINEL_ID_152>",
1890
+ "single_word": false,
1891
+ "lstrip": false,
1892
+ "rstrip": false,
1893
+ "normalized": false,
1894
+ "special": true
1895
+ },
1896
+ {
1897
+ "id": 210,
1898
+ "content": "<SENTINEL_ID_153>",
1899
+ "single_word": false,
1900
+ "lstrip": false,
1901
+ "rstrip": false,
1902
+ "normalized": false,
1903
+ "special": true
1904
+ },
1905
+ {
1906
+ "id": 211,
1907
+ "content": "<SENTINEL_ID_154>",
1908
+ "single_word": false,
1909
+ "lstrip": false,
1910
+ "rstrip": false,
1911
+ "normalized": false,
1912
+ "special": true
1913
+ },
1914
+ {
1915
+ "id": 212,
1916
+ "content": "<SENTINEL_ID_155>",
1917
+ "single_word": false,
1918
+ "lstrip": false,
1919
+ "rstrip": false,
1920
+ "normalized": false,
1921
+ "special": true
1922
+ },
1923
+ {
1924
+ "id": 213,
1925
+ "content": "<SENTINEL_ID_156>",
1926
+ "single_word": false,
1927
+ "lstrip": false,
1928
+ "rstrip": false,
1929
+ "normalized": false,
1930
+ "special": true
1931
+ },
1932
+ {
1933
+ "id": 214,
1934
+ "content": "<SENTINEL_ID_157>",
1935
+ "single_word": false,
1936
+ "lstrip": false,
1937
+ "rstrip": false,
1938
+ "normalized": false,
1939
+ "special": true
1940
+ },
1941
+ {
1942
+ "id": 215,
1943
+ "content": "<SENTINEL_ID_158>",
1944
+ "single_word": false,
1945
+ "lstrip": false,
1946
+ "rstrip": false,
1947
+ "normalized": false,
1948
+ "special": true
1949
+ },
1950
+ {
1951
+ "id": 216,
1952
+ "content": "<SENTINEL_ID_159>",
1953
+ "single_word": false,
1954
+ "lstrip": false,
1955
+ "rstrip": false,
1956
+ "normalized": false,
1957
+ "special": true
1958
+ },
1959
+ {
1960
+ "id": 217,
1961
+ "content": "<SENTINEL_ID_160>",
1962
+ "single_word": false,
1963
+ "lstrip": false,
1964
+ "rstrip": false,
1965
+ "normalized": false,
1966
+ "special": true
1967
+ },
1968
+ {
1969
+ "id": 218,
1970
+ "content": "<SENTINEL_ID_161>",
1971
+ "single_word": false,
1972
+ "lstrip": false,
1973
+ "rstrip": false,
1974
+ "normalized": false,
1975
+ "special": true
1976
+ },
1977
+ {
1978
+ "id": 219,
1979
+ "content": "<SENTINEL_ID_162>",
1980
+ "single_word": false,
1981
+ "lstrip": false,
1982
+ "rstrip": false,
1983
+ "normalized": false,
1984
+ "special": true
1985
+ },
1986
+ {
1987
+ "id": 220,
1988
+ "content": "<SENTINEL_ID_163>",
1989
+ "single_word": false,
1990
+ "lstrip": false,
1991
+ "rstrip": false,
1992
+ "normalized": false,
1993
+ "special": true
1994
+ },
1995
+ {
1996
+ "id": 221,
1997
+ "content": "<SENTINEL_ID_164>",
1998
+ "single_word": false,
1999
+ "lstrip": false,
2000
+ "rstrip": false,
2001
+ "normalized": false,
2002
+ "special": true
2003
+ },
2004
+ {
2005
+ "id": 222,
2006
+ "content": "<SENTINEL_ID_165>",
2007
+ "single_word": false,
2008
+ "lstrip": false,
2009
+ "rstrip": false,
2010
+ "normalized": false,
2011
+ "special": true
2012
+ },
2013
+ {
2014
+ "id": 223,
2015
+ "content": "<SENTINEL_ID_166>",
2016
+ "single_word": false,
2017
+ "lstrip": false,
2018
+ "rstrip": false,
2019
+ "normalized": false,
2020
+ "special": true
2021
+ },
2022
+ {
2023
+ "id": 224,
2024
+ "content": "<SENTINEL_ID_167>",
2025
+ "single_word": false,
2026
+ "lstrip": false,
2027
+ "rstrip": false,
2028
+ "normalized": false,
2029
+ "special": true
2030
+ },
2031
+ {
2032
+ "id": 225,
2033
+ "content": "<SENTINEL_ID_168>",
2034
+ "single_word": false,
2035
+ "lstrip": false,
2036
+ "rstrip": false,
2037
+ "normalized": false,
2038
+ "special": true
2039
+ },
2040
+ {
2041
+ "id": 226,
2042
+ "content": "<SENTINEL_ID_169>",
2043
+ "single_word": false,
2044
+ "lstrip": false,
2045
+ "rstrip": false,
2046
+ "normalized": false,
2047
+ "special": true
2048
+ },
2049
+ {
2050
+ "id": 227,
2051
+ "content": "<SENTINEL_ID_170>",
2052
+ "single_word": false,
2053
+ "lstrip": false,
2054
+ "rstrip": false,
2055
+ "normalized": false,
2056
+ "special": true
2057
+ },
2058
+ {
2059
+ "id": 228,
2060
+ "content": "<SENTINEL_ID_171>",
2061
+ "single_word": false,
2062
+ "lstrip": false,
2063
+ "rstrip": false,
2064
+ "normalized": false,
2065
+ "special": true
2066
+ },
2067
+ {
2068
+ "id": 229,
2069
+ "content": "<SENTINEL_ID_172>",
2070
+ "single_word": false,
2071
+ "lstrip": false,
2072
+ "rstrip": false,
2073
+ "normalized": false,
2074
+ "special": true
2075
+ },
2076
+ {
2077
+ "id": 230,
2078
+ "content": "<SENTINEL_ID_173>",
2079
+ "single_word": false,
2080
+ "lstrip": false,
2081
+ "rstrip": false,
2082
+ "normalized": false,
2083
+ "special": true
2084
+ },
2085
+ {
2086
+ "id": 231,
2087
+ "content": "<SENTINEL_ID_174>",
2088
+ "single_word": false,
2089
+ "lstrip": false,
2090
+ "rstrip": false,
2091
+ "normalized": false,
2092
+ "special": true
2093
+ },
2094
+ {
2095
+ "id": 232,
2096
+ "content": "<SENTINEL_ID_175>",
2097
+ "single_word": false,
2098
+ "lstrip": false,
2099
+ "rstrip": false,
2100
+ "normalized": false,
2101
+ "special": true
2102
+ },
2103
+ {
2104
+ "id": 233,
2105
+ "content": "<SENTINEL_ID_176>",
2106
+ "single_word": false,
2107
+ "lstrip": false,
2108
+ "rstrip": false,
2109
+ "normalized": false,
2110
+ "special": true
2111
+ },
2112
+ {
2113
+ "id": 234,
2114
+ "content": "<SENTINEL_ID_177>",
2115
+ "single_word": false,
2116
+ "lstrip": false,
2117
+ "rstrip": false,
2118
+ "normalized": false,
2119
+ "special": true
2120
+ },
2121
+ {
2122
+ "id": 235,
2123
+ "content": "<SENTINEL_ID_178>",
2124
+ "single_word": false,
2125
+ "lstrip": false,
2126
+ "rstrip": false,
2127
+ "normalized": false,
2128
+ "special": true
2129
+ },
2130
+ {
2131
+ "id": 236,
2132
+ "content": "<SENTINEL_ID_179>",
2133
+ "single_word": false,
2134
+ "lstrip": false,
2135
+ "rstrip": false,
2136
+ "normalized": false,
2137
+ "special": true
2138
+ },
2139
+ {
2140
+ "id": 237,
2141
+ "content": "<SENTINEL_ID_180>",
2142
+ "single_word": false,
2143
+ "lstrip": false,
2144
+ "rstrip": false,
2145
+ "normalized": false,
2146
+ "special": true
2147
+ },
2148
+ {
2149
+ "id": 238,
2150
+ "content": "<SENTINEL_ID_181>",
2151
+ "single_word": false,
2152
+ "lstrip": false,
2153
+ "rstrip": false,
2154
+ "normalized": false,
2155
+ "special": true
2156
+ },
2157
+ {
2158
+ "id": 239,
2159
+ "content": "<SENTINEL_ID_182>",
2160
+ "single_word": false,
2161
+ "lstrip": false,
2162
+ "rstrip": false,
2163
+ "normalized": false,
2164
+ "special": true
2165
+ },
2166
+ {
2167
+ "id": 240,
2168
+ "content": "<SENTINEL_ID_183>",
2169
+ "single_word": false,
2170
+ "lstrip": false,
2171
+ "rstrip": false,
2172
+ "normalized": false,
2173
+ "special": true
2174
+ },
2175
+ {
2176
+ "id": 241,
2177
+ "content": "<SENTINEL_ID_184>",
2178
+ "single_word": false,
2179
+ "lstrip": false,
2180
+ "rstrip": false,
2181
+ "normalized": false,
2182
+ "special": true
2183
+ },
2184
+ {
2185
+ "id": 242,
2186
+ "content": "<SENTINEL_ID_185>",
2187
+ "single_word": false,
2188
+ "lstrip": false,
2189
+ "rstrip": false,
2190
+ "normalized": false,
2191
+ "special": true
2192
+ },
2193
+ {
2194
+ "id": 243,
2195
+ "content": "<SENTINEL_ID_186>",
2196
+ "single_word": false,
2197
+ "lstrip": false,
2198
+ "rstrip": false,
2199
+ "normalized": false,
2200
+ "special": true
2201
+ },
2202
+ {
2203
+ "id": 244,
2204
+ "content": "<SENTINEL_ID_187>",
2205
+ "single_word": false,
2206
+ "lstrip": false,
2207
+ "rstrip": false,
2208
+ "normalized": false,
2209
+ "special": true
2210
+ },
2211
+ {
2212
+ "id": 245,
2213
+ "content": "<SENTINEL_ID_188>",
2214
+ "single_word": false,
2215
+ "lstrip": false,
2216
+ "rstrip": false,
2217
+ "normalized": false,
2218
+ "special": true
2219
+ },
2220
+ {
2221
+ "id": 246,
2222
+ "content": "<SENTINEL_ID_189>",
2223
+ "single_word": false,
2224
+ "lstrip": false,
2225
+ "rstrip": false,
2226
+ "normalized": false,
2227
+ "special": true
2228
+ },
2229
+ {
2230
+ "id": 247,
2231
+ "content": "<SENTINEL_ID_190>",
2232
+ "single_word": false,
2233
+ "lstrip": false,
2234
+ "rstrip": false,
2235
+ "normalized": false,
2236
+ "special": true
2237
+ },
2238
+ {
2239
+ "id": 248,
2240
+ "content": "<SENTINEL_ID_191>",
2241
+ "single_word": false,
2242
+ "lstrip": false,
2243
+ "rstrip": false,
2244
+ "normalized": false,
2245
+ "special": true
2246
+ },
2247
+ {
2248
+ "id": 249,
2249
+ "content": "<SENTINEL_ID_192>",
2250
+ "single_word": false,
2251
+ "lstrip": false,
2252
+ "rstrip": false,
2253
+ "normalized": false,
2254
+ "special": true
2255
+ },
2256
+ {
2257
+ "id": 250,
2258
+ "content": "<SENTINEL_ID_193>",
2259
+ "single_word": false,
2260
+ "lstrip": false,
2261
+ "rstrip": false,
2262
+ "normalized": false,
2263
+ "special": true
2264
+ },
2265
+ {
2266
+ "id": 251,
2267
+ "content": "<SENTINEL_ID_194>",
2268
+ "single_word": false,
2269
+ "lstrip": false,
2270
+ "rstrip": false,
2271
+ "normalized": false,
2272
+ "special": true
2273
+ },
2274
+ {
2275
+ "id": 252,
2276
+ "content": "<SENTINEL_ID_195>",
2277
+ "single_word": false,
2278
+ "lstrip": false,
2279
+ "rstrip": false,
2280
+ "normalized": false,
2281
+ "special": true
2282
+ },
2283
+ {
2284
+ "id": 253,
2285
+ "content": "<SENTINEL_ID_196>",
2286
+ "single_word": false,
2287
+ "lstrip": false,
2288
+ "rstrip": false,
2289
+ "normalized": false,
2290
+ "special": true
2291
+ },
2292
+ {
2293
+ "id": 254,
2294
+ "content": "<SENTINEL_ID_197>",
2295
+ "single_word": false,
2296
+ "lstrip": false,
2297
+ "rstrip": false,
2298
+ "normalized": false,
2299
+ "special": true
2300
+ },
2301
+ {
2302
+ "id": 255,
2303
+ "content": "<SENTINEL_ID_198>",
2304
+ "single_word": false,
2305
+ "lstrip": false,
2306
+ "rstrip": false,
2307
+ "normalized": false,
2308
+ "special": true
2309
+ },
2310
+ {
2311
+ "id": 256,
2312
+ "content": "<SENTINEL_ID_199>",
2313
+ "single_word": false,
2314
+ "lstrip": false,
2315
+ "rstrip": false,
2316
+ "normalized": false,
2317
+ "special": true
2318
+ },
2319
+ {
2320
+ "id": 257,
2321
+ "content": "<MOLECULAR_ENTITY_TYPE_ANTIGEN>",
2322
+ "single_word": false,
2323
+ "lstrip": false,
2324
+ "rstrip": false,
2325
+ "normalized": false,
2326
+ "special": true
2327
+ },
2328
+ {
2329
+ "id": 258,
2330
+ "content": "<MOLECULAR_ENTITY_TYPE_ANTIBODY_LIGHT_CHAIN>",
2331
+ "single_word": false,
2332
+ "lstrip": false,
2333
+ "rstrip": false,
2334
+ "normalized": false,
2335
+ "special": true
2336
+ },
2337
+ {
2338
+ "id": 259,
2339
+ "content": "<MOLECULAR_ENTITY_TYPE_ANTIBODY_HEAVY_CHAIN>",
2340
+ "single_word": false,
2341
+ "lstrip": false,
2342
+ "rstrip": false,
2343
+ "normalized": false,
2344
+ "special": true
2345
+ },
2346
+ {
2347
+ "id": 260,
2348
+ "content": "<ATTRIBUTE_ORGANISM>",
2349
+ "single_word": false,
2350
+ "lstrip": false,
2351
+ "rstrip": false,
2352
+ "normalized": false,
2353
+ "special": true
2354
+ },
2355
+ {
2356
+ "id": 261,
2357
+ "content": "<ATTRIBUTE_ORGANISM_HUMAN>",
2358
+ "single_word": false,
2359
+ "lstrip": false,
2360
+ "rstrip": false,
2361
+ "normalized": false,
2362
+ "special": true
2363
+ },
2364
+ {
2365
+ "id": 262,
2366
+ "content": "<ATTRIBUTE_ORGANISM_RABBIT>",
2367
+ "single_word": false,
2368
+ "lstrip": false,
2369
+ "rstrip": false,
2370
+ "normalized": false,
2371
+ "special": true
2372
+ },
2373
+ {
2374
+ "id": 263,
2375
+ "content": "<ATTRIBUTE_ORGANISM_RAT>",
2376
+ "single_word": false,
2377
+ "lstrip": false,
2378
+ "rstrip": false,
2379
+ "normalized": false,
2380
+ "special": true
2381
+ },
2382
+ {
2383
+ "id": 264,
2384
+ "content": "<ATTRIBUTE_ORGANISM_MOUSE>",
2385
+ "single_word": false,
2386
+ "lstrip": false,
2387
+ "rstrip": false,
2388
+ "normalized": false,
2389
+ "special": true
2390
+ },
2391
+ {
2392
+ "id": 265,
2393
+ "content": "<ATTRIBUTE_ORGANISM_MONKEY>",
2394
+ "single_word": false,
2395
+ "lstrip": false,
2396
+ "rstrip": false,
2397
+ "normalized": false,
2398
+ "special": true
2399
+ },
2400
+ {
2401
+ "id": 266,
2402
+ "content": "<ATTRIBUTE_ORGANISM_CAMEL>",
2403
+ "single_word": false,
2404
+ "lstrip": false,
2405
+ "rstrip": false,
2406
+ "normalized": false,
2407
+ "special": true
2408
+ },
2409
+ {
2410
+ "id": 267,
2411
+ "content": "<EPITOPE_PARATOPE_PREDICTION>",
2412
+ "single_word": false,
2413
+ "lstrip": false,
2414
+ "rstrip": false,
2415
+ "normalized": false,
2416
+ "special": true
2417
+ },
2418
+ {
2419
+ "id": 268,
2420
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR1>",
2421
+ "single_word": false,
2422
+ "lstrip": false,
2423
+ "rstrip": false,
2424
+ "normalized": false,
2425
+ "special": true
2426
+ },
2427
+ {
2428
+ "id": 269,
2429
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR3>",
2430
+ "single_word": false,
2431
+ "lstrip": false,
2432
+ "rstrip": false,
2433
+ "normalized": false,
2434
+ "special": true
2435
+ },
2436
+ {
2437
+ "id": 270,
2438
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR3>",
2439
+ "single_word": false,
2440
+ "lstrip": false,
2441
+ "rstrip": false,
2442
+ "normalized": false,
2443
+ "special": true
2444
+ },
2445
+ {
2446
+ "id": 271,
2447
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR2>",
2448
+ "single_word": false,
2449
+ "lstrip": false,
2450
+ "rstrip": false,
2451
+ "normalized": false,
2452
+ "special": true
2453
+ },
2454
+ {
2455
+ "id": 272,
2456
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR2>",
2457
+ "single_word": false,
2458
+ "lstrip": false,
2459
+ "rstrip": false,
2460
+ "normalized": false,
2461
+ "special": true
2462
+ },
2463
+ {
2464
+ "id": 273,
2465
+ "content": "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR1>",
2466
+ "single_word": false,
2467
+ "lstrip": false,
2468
+ "rstrip": false,
2469
+ "normalized": false,
2470
+ "special": true
2471
+ },
2472
+ {
2473
+ "id": 274,
2474
+ "content": "<MOLECULAR_ENTITY_GENERAL_PROTEIN>",
2475
+ "single_word": false,
2476
+ "lstrip": false,
2477
+ "rstrip": false,
2478
+ "normalized": false,
2479
+ "special": true
2480
+ },
2481
+ {
2482
+ "id": 275,
2483
+ "content": "<TIMESTEP>",
2484
+ "single_word": false,
2485
+ "lstrip": false,
2486
+ "rstrip": false,
2487
+ "normalized": false,
2488
+ "special": true
2489
+ },
2490
+ {
2491
+ "id": 276,
2492
+ "content": "<DIFFUSION>",
2493
+ "single_word": false,
2494
+ "lstrip": false,
2495
+ "rstrip": false,
2496
+ "normalized": false,
2497
+ "special": true
2498
+ },
2499
+ {
2500
+ "id": 277,
2501
+ "content": "<SEQUENCE_NATURAL_END>",
2502
+ "single_word": false,
2503
+ "lstrip": false,
2504
+ "rstrip": false,
2505
+ "normalized": false,
2506
+ "special": true
2507
+ },
2508
+ {
2509
+ "id": 278,
2510
+ "content": "<SMILES_SEQUENCE>",
2511
+ "single_word": false,
2512
+ "lstrip": false,
2513
+ "rstrip": false,
2514
+ "normalized": false,
2515
+ "special": true
2516
+ },
2517
+ {
2518
+ "id": 279,
2519
+ "content": "<SELFIES_SEQUENCE>",
2520
+ "single_word": false,
2521
+ "lstrip": false,
2522
+ "rstrip": false,
2523
+ "normalized": false,
2524
+ "special": true
2525
+ },
2526
+ {
2527
+ "id": 280,
2528
+ "content": "<AMINO_ACID_SEQUENCE>",
2529
+ "single_word": false,
2530
+ "lstrip": false,
2531
+ "rstrip": false,
2532
+ "normalized": false,
2533
+ "special": true
2534
+ },
2535
+ {
2536
+ "id": 281,
2537
+ "content": "<GENERAL_AFFINITY_CLASS>",
2538
+ "single_word": false,
2539
+ "lstrip": false,
2540
+ "rstrip": false,
2541
+ "normalized": false,
2542
+ "special": true
2543
+ },
2544
+ {
2545
+ "id": 282,
2546
+ "content": "<BACKSPACE>",
2547
+ "single_word": false,
2548
+ "lstrip": false,
2549
+ "rstrip": false,
2550
+ "normalized": false,
2551
+ "special": true
2552
+ },
2553
+ {
2554
+ "id": 283,
2555
+ "content": "<SEQUENCE_NATURAL_START>",
2556
+ "single_word": false,
2557
+ "lstrip": false,
2558
+ "rstrip": false,
2559
+ "normalized": false,
2560
+ "special": true
2561
+ },
2562
+ {
2563
+ "id": 284,
2564
+ "content": "<NOOP>",
2565
+ "single_word": false,
2566
+ "lstrip": false,
2567
+ "rstrip": false,
2568
+ "normalized": false,
2569
+ "special": true
2570
+ },
2571
+ {
2572
+ "id": 285,
2573
+ "content": "<TARGETED_ANTIBODY_DESIGN_ENCODER_ONLY_MODE>",
2574
+ "single_word": false,
2575
+ "lstrip": false,
2576
+ "rstrip": false,
2577
+ "normalized": false,
2578
+ "special": true
2579
+ },
2580
+ {
2581
+ "id": 286,
2582
+ "content": "<MOLECULAR_ENTITY_SMALL_MOLECULE>",
2583
+ "single_word": false,
2584
+ "lstrip": false,
2585
+ "rstrip": false,
2586
+ "normalized": false,
2587
+ "special": true
2588
+ },
2589
+ {
2590
+ "id": 287,
2591
+ "content": "<MOLECULAR_ENTITY_CELL_GENE_EXPRESSION_RANKED>",
2592
+ "single_word": false,
2593
+ "lstrip": false,
2594
+ "rstrip": false,
2595
+ "normalized": false,
2596
+ "special": true
2597
+ },
2598
+ {
2599
+ "id": 288,
2600
+ "content": "<CELL_TYPE_CLASS>",
2601
+ "single_word": false,
2602
+ "lstrip": false,
2603
+ "rstrip": false,
2604
+ "normalized": false,
2605
+ "special": true
2606
+ },
2607
+ {
2608
+ "id": 289,
2609
+ "content": "<TISSUE_TYPE_CLASS>",
2610
+ "single_word": false,
2611
+ "lstrip": false,
2612
+ "rstrip": false,
2613
+ "normalized": false,
2614
+ "special": true
2615
+ },
2616
+ {
2617
+ "id": 290,
2618
+ "content": "<CORRUPTED_AREA_START>",
2619
+ "single_word": false,
2620
+ "lstrip": false,
2621
+ "rstrip": false,
2622
+ "normalized": false,
2623
+ "special": true
2624
+ },
2625
+ {
2626
+ "id": 291,
2627
+ "content": "<CORRUPTED_AREA_END>",
2628
+ "single_word": false,
2629
+ "lstrip": false,
2630
+ "rstrip": false,
2631
+ "normalized": false,
2632
+ "special": true
2633
+ },
2634
+ {
2635
+ "id": 292,
2636
+ "content": "<MOLECULAR_ENTITY_MUTATED_PROTEIN_CHAIN>",
2637
+ "single_word": false,
2638
+ "lstrip": false,
2639
+ "rstrip": false,
2640
+ "normalized": false,
2641
+ "special": true
2642
+ },
2643
+ {
2644
+ "id": 293,
2645
+ "content": "<MOLECULAR_ENTITY_PROTEIN_CHAIN>",
2646
+ "single_word": false,
2647
+ "lstrip": false,
2648
+ "rstrip": false,
2649
+ "normalized": false,
2650
+ "special": true
2651
+ },
2652
+ {
2653
+ "id": 294,
2654
+ "content": "<COMPLEX_ENTITY>",
2655
+ "single_word": false,
2656
+ "lstrip": false,
2657
+ "rstrip": false,
2658
+ "normalized": false,
2659
+ "special": true
2660
+ },
2661
+ {
2662
+ "id": 295,
2663
+ "content": "<ALTERNATIVE>",
2664
+ "single_word": false,
2665
+ "lstrip": false,
2666
+ "rstrip": false,
2667
+ "normalized": false,
2668
+ "special": true
2669
+ },
2670
+ {
2671
+ "id": 296,
2672
+ "content": "<CDR3_REGION>",
2673
+ "single_word": false,
2674
+ "lstrip": false,
2675
+ "rstrip": false,
2676
+ "normalized": false,
2677
+ "special": true
2678
+ },
2679
+ {
2680
+ "id": 297,
2681
+ "content": "<GENERAL_CHAIN>",
2682
+ "single_word": false,
2683
+ "lstrip": false,
2684
+ "rstrip": false,
2685
+ "normalized": false,
2686
+ "special": true
2687
+ },
2688
+ {
2689
+ "id": 298,
2690
+ "content": "<SUBMOLECULAR_ENTITY>",
2691
+ "single_word": false,
2692
+ "lstrip": false,
2693
+ "rstrip": false,
2694
+ "normalized": false,
2695
+ "special": true
2696
+ },
2697
+ {
2698
+ "id": 299,
2699
+ "content": "<MUTATED>",
2700
+ "single_word": false,
2701
+ "lstrip": false,
2702
+ "rstrip": false,
2703
+ "normalized": false,
2704
+ "special": true
2705
+ },
2706
+ {
2707
+ "id": 300,
2708
+ "content": "<MOLECULAR_ENTITY_TCR_ALPHA_CDR3>",
2709
+ "single_word": false,
2710
+ "lstrip": false,
2711
+ "rstrip": false,
2712
+ "normalized": false,
2713
+ "special": true
2714
+ },
2715
+ {
2716
+ "id": 301,
2717
+ "content": "<MOLECULAR_ENTITY_TCR_DELTA_CDR3>",
2718
+ "single_word": false,
2719
+ "lstrip": false,
2720
+ "rstrip": false,
2721
+ "normalized": false,
2722
+ "special": true
2723
+ },
2724
+ {
2725
+ "id": 302,
2726
+ "content": "<MOLECULAR_ENTITY_TCR_DELTA_VAR>",
2727
+ "single_word": false,
2728
+ "lstrip": false,
2729
+ "rstrip": false,
2730
+ "normalized": false,
2731
+ "special": true
2732
+ },
2733
+ {
2734
+ "id": 303,
2735
+ "content": "<MOLECULAR_ENTITY_TCR_GAMMA_CDR3>",
2736
+ "single_word": false,
2737
+ "lstrip": false,
2738
+ "rstrip": false,
2739
+ "normalized": false,
2740
+ "special": true
2741
+ },
2742
+ {
2743
+ "id": 304,
2744
+ "content": "<MOLECULAR_ENTITY_TCR_GAMMA_VAR>",
2745
+ "single_word": false,
2746
+ "lstrip": false,
2747
+ "rstrip": false,
2748
+ "normalized": false,
2749
+ "special": true
2750
+ },
2751
+ {
2752
+ "id": 305,
2753
+ "content": "<SCALAR>",
2754
+ "single_word": false,
2755
+ "lstrip": false,
2756
+ "rstrip": false,
2757
+ "normalized": false,
2758
+ "special": true
2759
+ },
2760
+ {
2761
+ "id": 306,
2762
+ "content": "<VECTOR>",
2763
+ "single_word": false,
2764
+ "lstrip": false,
2765
+ "rstrip": false,
2766
+ "normalized": false,
2767
+ "special": true
2768
+ },
2769
+ {
2770
+ "id": 307,
2771
+ "content": "<MASKED_SCALAR>",
2772
+ "single_word": false,
2773
+ "lstrip": false,
2774
+ "rstrip": false,
2775
+ "normalized": false,
2776
+ "special": true
2777
+ },
2778
+ {
2779
+ "id": 308,
2780
+ "content": "<MASKED_VECTOR>",
2781
+ "single_word": false,
2782
+ "lstrip": false,
2783
+ "rstrip": false,
2784
+ "normalized": false,
2785
+ "special": true
2786
+ },
2787
+ {
2788
+ "id": 309,
2789
+ "content": "<AUTOENCODER_LATENT_LOG_VARIANCE>",
2790
+ "single_word": false,
2791
+ "lstrip": false,
2792
+ "rstrip": false,
2793
+ "normalized": false,
2794
+ "special": true
2795
+ },
2796
+ {
2797
+ "id": 310,
2798
+ "content": "<AUTOENCODER_LATENT_MEAN>",
2799
+ "single_word": false,
2800
+ "lstrip": false,
2801
+ "rstrip": false,
2802
+ "normalized": false,
2803
+ "special": true
2804
+ },
2805
+ {
2806
+ "id": 311,
2807
+ "content": "<AUTOENCODER_LATENT_SAMPLED_Z>",
2808
+ "single_word": false,
2809
+ "lstrip": false,
2810
+ "rstrip": false,
2811
+ "normalized": false,
2812
+ "special": true
2813
+ },
2814
+ {
2815
+ "id": 312,
2816
+ "content": "<AUTOENCODER_TASK>",
2817
+ "single_word": false,
2818
+ "lstrip": false,
2819
+ "rstrip": false,
2820
+ "normalized": false,
2821
+ "special": true
2822
+ },
2823
+ {
2824
+ "id": 313,
2825
+ "content": "<DECODED_FROM_LATENT>",
2826
+ "single_word": false,
2827
+ "lstrip": false,
2828
+ "rstrip": false,
2829
+ "normalized": false,
2830
+ "special": true
2831
+ },
2832
+ {
2833
+ "id": 314,
2834
+ "content": "<BBBP>",
2835
+ "single_word": false,
2836
+ "lstrip": false,
2837
+ "rstrip": false,
2838
+ "normalized": false,
2839
+ "special": true
2840
+ },
2841
+ {
2842
+ "id": 315,
2843
+ "content": "<FDA_APPR>",
2844
+ "single_word": false,
2845
+ "lstrip": false,
2846
+ "rstrip": false,
2847
+ "normalized": false,
2848
+ "special": true
2849
+ },
2850
+ {
2851
+ "id": 316,
2852
+ "content": "<HIV_ACTIVITY>",
2853
+ "single_word": false,
2854
+ "lstrip": false,
2855
+ "rstrip": false,
2856
+ "normalized": false,
2857
+ "special": true
2858
+ }
2859
+ ],
2860
+ "normalizer": null,
2861
+ "pre_tokenizer": {
2862
+ "type": "Sequence",
2863
+ "pretokenizers": [
2864
+ {
2865
+ "type": "Split",
2866
+ "pattern": {
2867
+ "Regex": "<.*?>|\\S"
2868
+ },
2869
+ "behavior": "Removed",
2870
+ "invert": true
2871
+ }
2872
+ ]
2873
+ },
2874
+ "post_processor": null,
2875
+ "decoder": null,
2876
+ "model": {
2877
+ "type": "WordLevel",
2878
+ "vocab": {
2879
+ "<UNK>": 0,
2880
+ "<PAD>": 1,
2881
+ "<CLS>": 2,
2882
+ "<SEP>": 3,
2883
+ "<MASK>": 4,
2884
+ "<EOS>": 5,
2885
+ "<MOLECULAR_ENTITY>": 6,
2886
+ "<GLOBAL_INTERACTION_ATTRIBUTES>": 7,
2887
+ "<MOLECULAR_ENTITY_ANTIGEN>": 8,
2888
+ "<MOLECULAR_ENTITY_EPITOPE>": 9,
2889
+ "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN>": 10,
2890
+ "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN>": 11,
2891
+ "<MOLECULAR_ENTITY_TCR_ALPHA_CHAIN>": 12,
2892
+ "<MOLECULAR_ENTITY_TCR_BETA_VDJ>": 13,
2893
+ "<MOLECULAR_ENTITY_TCR_BETA_CDR3>": 14,
2894
+ "<BINDING_AFFINITY_CLASS>": 15,
2895
+ "<DECODER_START>": 16,
2896
+ "<BINDING>": 17,
2897
+ "<FILLIN>": 18,
2898
+ "<REORDER>": 19,
2899
+ "<TOAA>": 20,
2900
+ "<ACTIVE>": 21,
2901
+ "<GENESEQ>": 22,
2902
+ "<INCREASE>": 23,
2903
+ "<DECREASE>": 24,
2904
+ "<STRUCTURE>": 25,
2905
+ "<DISTANCE>": 26,
2906
+ "<SOLUBILITY>": 27,
2907
+ "<TOXICITY>": 28,
2908
+ "<AB>": 29,
2909
+ "<ISACTIVE>": 30,
2910
+ "<ISSYNTHETIC>": 31,
2911
+ "<PENETR>": 32,
2912
+ "<ABSORPTION>": 33,
2913
+ "<DISTRIBUTION>": 34,
2914
+ "<METABOLISM>": 35,
2915
+ "<EXCRETION>": 36,
2916
+ "<FLUORESCENCE>": 37,
2917
+ "<STABILITY>": 38,
2918
+ "<DISORDER>": 39,
2919
+ "<DISEASE>": 40,
2920
+ "<BINARY>": 41,
2921
+ "<REGRESSION>": 42,
2922
+ "<ORGANISM>": 43,
2923
+ "<0>": 44,
2924
+ "<1>": 45,
2925
+ "<2>": 46,
2926
+ "<3>": 47,
2927
+ "<4>": 48,
2928
+ "<5>": 49,
2929
+ "<6>": 50,
2930
+ "<7>": 51,
2931
+ "<8>": 52,
2932
+ "<9>": 53,
2933
+ "<.>": 54,
2934
+ "<YES>": 55,
2935
+ "<NO>": 56,
2936
+ "<SENTINEL_ID_0>": 57,
2937
+ "<SENTINEL_ID_1>": 58,
2938
+ "<SENTINEL_ID_2>": 59,
2939
+ "<SENTINEL_ID_3>": 60,
2940
+ "<SENTINEL_ID_4>": 61,
2941
+ "<SENTINEL_ID_5>": 62,
2942
+ "<SENTINEL_ID_6>": 63,
2943
+ "<SENTINEL_ID_7>": 64,
2944
+ "<SENTINEL_ID_8>": 65,
2945
+ "<SENTINEL_ID_9>": 66,
2946
+ "<SENTINEL_ID_10>": 67,
2947
+ "<SENTINEL_ID_11>": 68,
2948
+ "<SENTINEL_ID_12>": 69,
2949
+ "<SENTINEL_ID_13>": 70,
2950
+ "<SENTINEL_ID_14>": 71,
2951
+ "<SENTINEL_ID_15>": 72,
2952
+ "<SENTINEL_ID_16>": 73,
2953
+ "<SENTINEL_ID_17>": 74,
2954
+ "<SENTINEL_ID_18>": 75,
2955
+ "<SENTINEL_ID_19>": 76,
2956
+ "<SENTINEL_ID_20>": 77,
2957
+ "<SENTINEL_ID_21>": 78,
2958
+ "<SENTINEL_ID_22>": 79,
2959
+ "<SENTINEL_ID_23>": 80,
2960
+ "<SENTINEL_ID_24>": 81,
2961
+ "<SENTINEL_ID_25>": 82,
2962
+ "<SENTINEL_ID_26>": 83,
2963
+ "<SENTINEL_ID_27>": 84,
2964
+ "<SENTINEL_ID_28>": 85,
2965
+ "<SENTINEL_ID_29>": 86,
2966
+ "<SENTINEL_ID_30>": 87,
2967
+ "<SENTINEL_ID_31>": 88,
2968
+ "<SENTINEL_ID_32>": 89,
2969
+ "<SENTINEL_ID_33>": 90,
2970
+ "<SENTINEL_ID_34>": 91,
2971
+ "<SENTINEL_ID_35>": 92,
2972
+ "<SENTINEL_ID_36>": 93,
2973
+ "<SENTINEL_ID_37>": 94,
2974
+ "<SENTINEL_ID_38>": 95,
2975
+ "<SENTINEL_ID_39>": 96,
2976
+ "<SENTINEL_ID_40>": 97,
2977
+ "<SENTINEL_ID_41>": 98,
2978
+ "<SENTINEL_ID_42>": 99,
2979
+ "<SENTINEL_ID_43>": 100,
2980
+ "<SENTINEL_ID_44>": 101,
2981
+ "<SENTINEL_ID_45>": 102,
2982
+ "<SENTINEL_ID_46>": 103,
2983
+ "<SENTINEL_ID_47>": 104,
2984
+ "<SENTINEL_ID_48>": 105,
2985
+ "<SENTINEL_ID_49>": 106,
2986
+ "<SENTINEL_ID_50>": 107,
2987
+ "<SENTINEL_ID_51>": 108,
2988
+ "<SENTINEL_ID_52>": 109,
2989
+ "<SENTINEL_ID_53>": 110,
2990
+ "<SENTINEL_ID_54>": 111,
2991
+ "<SENTINEL_ID_55>": 112,
2992
+ "<SENTINEL_ID_56>": 113,
2993
+ "<SENTINEL_ID_57>": 114,
2994
+ "<SENTINEL_ID_58>": 115,
2995
+ "<SENTINEL_ID_59>": 116,
2996
+ "<SENTINEL_ID_60>": 117,
2997
+ "<SENTINEL_ID_61>": 118,
2998
+ "<SENTINEL_ID_62>": 119,
2999
+ "<SENTINEL_ID_63>": 120,
3000
+ "<SENTINEL_ID_64>": 121,
3001
+ "<SENTINEL_ID_65>": 122,
3002
+ "<SENTINEL_ID_66>": 123,
3003
+ "<SENTINEL_ID_67>": 124,
3004
+ "<SENTINEL_ID_68>": 125,
3005
+ "<SENTINEL_ID_69>": 126,
3006
+ "<SENTINEL_ID_70>": 127,
3007
+ "<SENTINEL_ID_71>": 128,
3008
+ "<SENTINEL_ID_72>": 129,
3009
+ "<SENTINEL_ID_73>": 130,
3010
+ "<SENTINEL_ID_74>": 131,
3011
+ "<SENTINEL_ID_75>": 132,
3012
+ "<SENTINEL_ID_76>": 133,
3013
+ "<SENTINEL_ID_77>": 134,
3014
+ "<SENTINEL_ID_78>": 135,
3015
+ "<SENTINEL_ID_79>": 136,
3016
+ "<SENTINEL_ID_80>": 137,
3017
+ "<SENTINEL_ID_81>": 138,
3018
+ "<SENTINEL_ID_82>": 139,
3019
+ "<SENTINEL_ID_83>": 140,
3020
+ "<SENTINEL_ID_84>": 141,
3021
+ "<SENTINEL_ID_85>": 142,
3022
+ "<SENTINEL_ID_86>": 143,
3023
+ "<SENTINEL_ID_87>": 144,
3024
+ "<SENTINEL_ID_88>": 145,
3025
+ "<SENTINEL_ID_89>": 146,
3026
+ "<SENTINEL_ID_90>": 147,
3027
+ "<SENTINEL_ID_91>": 148,
3028
+ "<SENTINEL_ID_92>": 149,
3029
+ "<SENTINEL_ID_93>": 150,
3030
+ "<SENTINEL_ID_94>": 151,
3031
+ "<SENTINEL_ID_95>": 152,
3032
+ "<SENTINEL_ID_96>": 153,
3033
+ "<SENTINEL_ID_97>": 154,
3034
+ "<SENTINEL_ID_98>": 155,
3035
+ "<SENTINEL_ID_99>": 156,
3036
+ "<SENTINEL_ID_100>": 157,
3037
+ "<SENTINEL_ID_101>": 158,
3038
+ "<SENTINEL_ID_102>": 159,
3039
+ "<SENTINEL_ID_103>": 160,
3040
+ "<SENTINEL_ID_104>": 161,
3041
+ "<SENTINEL_ID_105>": 162,
3042
+ "<SENTINEL_ID_106>": 163,
3043
+ "<SENTINEL_ID_107>": 164,
3044
+ "<SENTINEL_ID_108>": 165,
3045
+ "<SENTINEL_ID_109>": 166,
3046
+ "<SENTINEL_ID_110>": 167,
3047
+ "<SENTINEL_ID_111>": 168,
3048
+ "<SENTINEL_ID_112>": 169,
3049
+ "<SENTINEL_ID_113>": 170,
3050
+ "<SENTINEL_ID_114>": 171,
3051
+ "<SENTINEL_ID_115>": 172,
3052
+ "<SENTINEL_ID_116>": 173,
3053
+ "<SENTINEL_ID_117>": 174,
3054
+ "<SENTINEL_ID_118>": 175,
3055
+ "<SENTINEL_ID_119>": 176,
3056
+ "<SENTINEL_ID_120>": 177,
3057
+ "<SENTINEL_ID_121>": 178,
3058
+ "<SENTINEL_ID_122>": 179,
3059
+ "<SENTINEL_ID_123>": 180,
3060
+ "<SENTINEL_ID_124>": 181,
3061
+ "<SENTINEL_ID_125>": 182,
3062
+ "<SENTINEL_ID_126>": 183,
3063
+ "<SENTINEL_ID_127>": 184,
3064
+ "<SENTINEL_ID_128>": 185,
3065
+ "<SENTINEL_ID_129>": 186,
3066
+ "<SENTINEL_ID_130>": 187,
3067
+ "<SENTINEL_ID_131>": 188,
3068
+ "<SENTINEL_ID_132>": 189,
3069
+ "<SENTINEL_ID_133>": 190,
3070
+ "<SENTINEL_ID_134>": 191,
3071
+ "<SENTINEL_ID_135>": 192,
3072
+ "<SENTINEL_ID_136>": 193,
3073
+ "<SENTINEL_ID_137>": 194,
3074
+ "<SENTINEL_ID_138>": 195,
3075
+ "<SENTINEL_ID_139>": 196,
3076
+ "<SENTINEL_ID_140>": 197,
3077
+ "<SENTINEL_ID_141>": 198,
3078
+ "<SENTINEL_ID_142>": 199,
3079
+ "<SENTINEL_ID_143>": 200,
3080
+ "<SENTINEL_ID_144>": 201,
3081
+ "<SENTINEL_ID_145>": 202,
3082
+ "<SENTINEL_ID_146>": 203,
3083
+ "<SENTINEL_ID_147>": 204,
3084
+ "<SENTINEL_ID_148>": 205,
3085
+ "<SENTINEL_ID_149>": 206,
3086
+ "<SENTINEL_ID_150>": 207,
3087
+ "<SENTINEL_ID_151>": 208,
3088
+ "<SENTINEL_ID_152>": 209,
3089
+ "<SENTINEL_ID_153>": 210,
3090
+ "<SENTINEL_ID_154>": 211,
3091
+ "<SENTINEL_ID_155>": 212,
3092
+ "<SENTINEL_ID_156>": 213,
3093
+ "<SENTINEL_ID_157>": 214,
3094
+ "<SENTINEL_ID_158>": 215,
3095
+ "<SENTINEL_ID_159>": 216,
3096
+ "<SENTINEL_ID_160>": 217,
3097
+ "<SENTINEL_ID_161>": 218,
3098
+ "<SENTINEL_ID_162>": 219,
3099
+ "<SENTINEL_ID_163>": 220,
3100
+ "<SENTINEL_ID_164>": 221,
3101
+ "<SENTINEL_ID_165>": 222,
3102
+ "<SENTINEL_ID_166>": 223,
3103
+ "<SENTINEL_ID_167>": 224,
3104
+ "<SENTINEL_ID_168>": 225,
3105
+ "<SENTINEL_ID_169>": 226,
3106
+ "<SENTINEL_ID_170>": 227,
3107
+ "<SENTINEL_ID_171>": 228,
3108
+ "<SENTINEL_ID_172>": 229,
3109
+ "<SENTINEL_ID_173>": 230,
3110
+ "<SENTINEL_ID_174>": 231,
3111
+ "<SENTINEL_ID_175>": 232,
3112
+ "<SENTINEL_ID_176>": 233,
3113
+ "<SENTINEL_ID_177>": 234,
3114
+ "<SENTINEL_ID_178>": 235,
3115
+ "<SENTINEL_ID_179>": 236,
3116
+ "<SENTINEL_ID_180>": 237,
3117
+ "<SENTINEL_ID_181>": 238,
3118
+ "<SENTINEL_ID_182>": 239,
3119
+ "<SENTINEL_ID_183>": 240,
3120
+ "<SENTINEL_ID_184>": 241,
3121
+ "<SENTINEL_ID_185>": 242,
3122
+ "<SENTINEL_ID_186>": 243,
3123
+ "<SENTINEL_ID_187>": 244,
3124
+ "<SENTINEL_ID_188>": 245,
3125
+ "<SENTINEL_ID_189>": 246,
3126
+ "<SENTINEL_ID_190>": 247,
3127
+ "<SENTINEL_ID_191>": 248,
3128
+ "<SENTINEL_ID_192>": 249,
3129
+ "<SENTINEL_ID_193>": 250,
3130
+ "<SENTINEL_ID_194>": 251,
3131
+ "<SENTINEL_ID_195>": 252,
3132
+ "<SENTINEL_ID_196>": 253,
3133
+ "<SENTINEL_ID_197>": 254,
3134
+ "<SENTINEL_ID_198>": 255,
3135
+ "<SENTINEL_ID_199>": 256,
3136
+ "<MOLECULAR_ENTITY_TYPE_ANTIGEN>": 257,
3137
+ "<MOLECULAR_ENTITY_TYPE_ANTIBODY_LIGHT_CHAIN>": 258,
3138
+ "<MOLECULAR_ENTITY_TYPE_ANTIBODY_HEAVY_CHAIN>": 259,
3139
+ "<ATTRIBUTE_ORGANISM>": 260,
3140
+ "<ATTRIBUTE_ORGANISM_HUMAN>": 261,
3141
+ "<ATTRIBUTE_ORGANISM_RABBIT>": 262,
3142
+ "<ATTRIBUTE_ORGANISM_RAT>": 263,
3143
+ "<ATTRIBUTE_ORGANISM_MOUSE>": 264,
3144
+ "<ATTRIBUTE_ORGANISM_MONKEY>": 265,
3145
+ "<ATTRIBUTE_ORGANISM_CAMEL>": 266,
3146
+ "<EPITOPE_PARATOPE_PREDICTION>": 267,
3147
+ "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR1>": 268,
3148
+ "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR3>": 269,
3149
+ "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR3>": 270,
3150
+ "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR2>": 271,
3151
+ "<MOLECULAR_ENTITY_ANTIBODY_HEAVY_CHAIN_CDR2>": 272,
3152
+ "<MOLECULAR_ENTITY_ANTIBODY_LIGHT_CHAIN_CDR1>": 273,
3153
+ "<MOLECULAR_ENTITY_GENERAL_PROTEIN>": 274,
3154
+ "<TIMESTEP>": 275,
3155
+ "<DIFFUSION>": 276,
3156
+ "<SEQUENCE_NATURAL_END>": 277,
3157
+ "<SMILES_SEQUENCE>": 278,
3158
+ "<SELFIES_SEQUENCE>": 279,
3159
+ "<AMINO_ACID_SEQUENCE>": 280,
3160
+ "<GENERAL_AFFINITY_CLASS>": 281,
3161
+ "<BACKSPACE>": 282,
3162
+ "<SEQUENCE_NATURAL_START>": 283,
3163
+ "<NOOP>": 284,
3164
+ "<TARGETED_ANTIBODY_DESIGN_ENCODER_ONLY_MODE>": 285,
3165
+ "<MOLECULAR_ENTITY_SMALL_MOLECULE>": 286,
3166
+ "<MOLECULAR_ENTITY_CELL_GENE_EXPRESSION_RANKED>": 287,
3167
+ "<CELL_TYPE_CLASS>": 288,
3168
+ "<TISSUE_TYPE_CLASS>": 289,
3169
+ "<CORRUPTED_AREA_START>": 290,
3170
+ "<CORRUPTED_AREA_END>": 291,
3171
+ "<MOLECULAR_ENTITY_MUTATED_PROTEIN_CHAIN>": 292,
3172
+ "<MOLECULAR_ENTITY_PROTEIN_CHAIN>": 293,
3173
+ "<COMPLEX_ENTITY>": 294,
3174
+ "<ALTERNATIVE>": 295,
3175
+ "<CDR3_REGION>": 296,
3176
+ "<GENERAL_CHAIN>": 297,
3177
+ "<SUBMOLECULAR_ENTITY>": 298,
3178
+ "<MUTATED>": 299,
3179
+ "<MOLECULAR_ENTITY_TCR_ALPHA_CDR3>": 300,
3180
+ "<MOLECULAR_ENTITY_TCR_DELTA_CDR3>": 301,
3181
+ "<MOLECULAR_ENTITY_TCR_DELTA_VAR>": 302,
3182
+ "<MOLECULAR_ENTITY_TCR_GAMMA_CDR3>": 303,
3183
+ "<MOLECULAR_ENTITY_TCR_GAMMA_VAR>": 304,
3184
+ "<SCALAR>": 305,
3185
+ "<VECTOR>": 306,
3186
+ "<MASKED_SCALAR>": 307,
3187
+ "<MASKED_VECTOR>": 308,
3188
+ "<AUTOENCODER_LATENT_LOG_VARIANCE>": 309,
3189
+ "<AUTOENCODER_LATENT_MEAN>": 310,
3190
+ "<AUTOENCODER_LATENT_SAMPLED_Z>": 311,
3191
+ "<AUTOENCODER_TASK>": 312,
3192
+ "<DECODED_FROM_LATENT>": 313,
3193
+ "<BBBP>": 314,
3194
+ "<FDA_APPR>": 315,
3195
+ "<HIV_ACTIVITY>": 316,
3196
+ "A": 501,
3197
+ "B": 502,
3198
+ "C": 503,
3199
+ "D": 504,
3200
+ "E": 505,
3201
+ "F": 506,
3202
+ "G": 507,
3203
+ "H": 508,
3204
+ "I": 509,
3205
+ "K": 510,
3206
+ "L": 511,
3207
+ "M": 512,
3208
+ "N": 513,
3209
+ "O": 514,
3210
+ "P": 515,
3211
+ "Q": 516,
3212
+ "R": 517,
3213
+ "S": 518,
3214
+ "T": 519,
3215
+ "U": 520,
3216
+ "V": 521,
3217
+ "W": 522,
3218
+ "X": 523,
3219
+ "Y": 524,
3220
+ "Z": 525,
3221
+ ":": 526
3222
+ },
3223
+ "unk_token": "<UNK>"
3224
+ }
3225
+ }