YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model description

This model is BERT-based architecture with 8 layers. The detailed config is summarized as follows. The drug-like molecule BERT is inspired by "Self-Attention Based Molecule Representation for Predicting Drug-Target Interaction". We modified several points of training procedures.

config = BertConfig(
    vocab_size=vocab_size,
    hidden_size=128,
    num_hidden_layers=8,
    num_attention_heads=8,
    intermediate_size=512,
    hidden_act="gelu",
    hidden_dropout_prob=0.1,
    attention_probs_dropout_prob=0.1,
    max_position_embeddings=max_seq_len + 2,
    type_vocab_size=1,
    pad_token_id=0,
    position_embedding_type="absolute"
)

Training and evaluation data

It's trained on drug-like molecules on the PubChem database. The PubChem database contains more than 100 M molecules, therefore, we filtered drug-like molecules using the quality of drug-likeliness score (QED). The 4.1 M molecules were filtered and the QED score threshold was set to 0.7.

Tokenizer

We utilize a character-level tokenizer. The special tokens are "[SOS]", "[EOS]", "[PAD]", "[UNK]".

Training hyperparameters

The following hyperparameters were used during training:

  • Adam optimizer, learning_rate: 5e-4, scheduler: cosine annealing
  • Batch size: 2048
  • Training steps: 24 K
  • Training_precision: FP16
  • Loss function: cross-entropy loss
  • Training masking rate: 30 %
  • Testing masking rate: 15 % (original molecule BERT utilized 15 % of masking rate)
  • NSP task: None

Performance

  • Accuracy: 94.02 %
Downloads last month
15
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.