L2658183739
/

PaddleOCR-VL-1.5-OCSR

image-text-to-text

molecular-structure-recognition

Model card Files Files and versions

PaddleOCR-VL-1.5-OCSR

This repository hosts a merged fine-tuned OCSR model based on PaddleOCR-VL-1.5.

The task is:

input: molecular structure image
output: canonical SMILES

Base Model

Base model family: PaddleOCR-VL-1.5
Fine-tuning method: LoRA SFT, then merged export

Training Objective

The fine-tuning line is restricted to a single output space:

canonical SMILES only

Incompatible target spaces such as ssml_normed, chemfig, and LaTeX formula targets were excluded from the main line.

Current Local Evaluation

`canonical_smiles_main_v1`

canonical exact accuracy: 32.86%
token micro F1: 70.35%
valid SMILES rate: 71.84%
mean fingerprint Tanimoto: 0.6992

`ocsr_realworld_mixed_eval_v1p1`

canonical exact accuracy: 33.77%
token micro F1: 70.18%
valid SMILES rate: 75.84%
mean fingerprint Tanimoto: 0.6849

Source-level observations

relatively strong on uob
moderate on uspto
weak on real_world
very weak on decimer handwritten structures
weak on edu_chemc

Intended Use

This model is intended for:

OCSR research
benchmarking
pipeline prototyping
weak-domain error analysis

It is not yet a state-of-the-art production OCSR model.

Limitations

performance is highly uneven across data domains
handwritten chemical structures remain a major failure mode
real-world photos/scans are still weak
exact benchmark performance is below strong public OCSR baselines such as MolNexTR / MolScribe on standard UOB and USPTO comparisons

Files

This repository includes:

merged model weights
tokenizer files
processor/preprocessor files
remote-code files required by the custom model class

Notes

This model card reflects the current V2-1 baseline state in the associated project workspace.

Downloads last month: 11

Safetensors

Model size

1.0B params

Tensor type

BF16

·