PaddleOCR-VL-1.5-OCSR

This repository hosts a merged fine-tuned OCSR model based on PaddleOCR-VL-1.5.

The task is:

  • input: molecular structure image
  • output: canonical SMILES

Base Model

  • Base model family: PaddleOCR-VL-1.5
  • Fine-tuning method: LoRA SFT, then merged export

Training Objective

The fine-tuning line is restricted to a single output space:

  • canonical SMILES only

Incompatible target spaces such as ssml_normed, chemfig, and LaTeX formula targets were excluded from the main line.

Current Local Evaluation

canonical_smiles_main_v1

  • canonical exact accuracy: 32.86%
  • token micro F1: 70.35%
  • valid SMILES rate: 71.84%
  • mean fingerprint Tanimoto: 0.6992

ocsr_realworld_mixed_eval_v1p1

  • canonical exact accuracy: 33.77%
  • token micro F1: 70.18%
  • valid SMILES rate: 75.84%
  • mean fingerprint Tanimoto: 0.6849

Source-level observations

  • relatively strong on uob
  • moderate on uspto
  • weak on real_world
  • very weak on decimer handwritten structures
  • weak on edu_chemc

Intended Use

This model is intended for:

  • OCSR research
  • benchmarking
  • pipeline prototyping
  • weak-domain error analysis

It is not yet a state-of-the-art production OCSR model.

Limitations

  • performance is highly uneven across data domains
  • handwritten chemical structures remain a major failure mode
  • real-world photos/scans are still weak
  • exact benchmark performance is below strong public OCSR baselines such as MolNexTR / MolScribe on standard UOB and USPTO comparisons

Files

This repository includes:

  • merged model weights
  • tokenizer files
  • processor/preprocessor files
  • remote-code files required by the custom model class

Notes

This model card reflects the current V2-1 baseline state in the associated project workspace.

Downloads last month
11
Safetensors
Model size
1.0B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support