This ChemBERTa-v2 checkpoint was fine-tuned on the USPTO-50k dataset for sequence classification.

Specifically, the objective is to predict the reaction class label, and the input is either (canonicalized) all reactant SMILES or all product SMILES (separated by ".").

  • Train/Test split: 0.99/0.01

  • Evaluation results:

    • Accuracy: 87.11%
    • Loss: 0.4272
  • Fine-tuning hyperparameters:

    • seed = 233
    • batch-size = 128
    • num_epochs = 5 (but early stopped at epoch 4)
    • learning_rate = 5e-4
    • warmup_steps = 64
    • weight_decay = 0.01
    • lr_scheduler_type = "cosine"
Downloads last month
18
Safetensors
Model size
83.5M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train Phando/chemberta-v2-finetuned-uspto-50k-classification